Detailed Action

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim 1-14, 16-20 are pending.
Claim 15 is cancelled.

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 08/29/2022 has been entered.
 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, the claim recites “evaluating performance of the trained neural network on the proxy task to determine a proxy performance metric for the candidate architecture on the proxy task, the proxy performance metric approximating a real performance metric of the candidate architecture on the particular neural network task”. It is unclear what constitutes ‘approximating a real performance metric’. Does the claim recite a process of making a system that approximates the real performance of the neural network, or making a neural network using smaller proxy task shows similar performance compared to the real performance?
	For purpose of examination, the claim is interpreted as: The neural network using smaller proxy task shows similar performance compared to the real performance.
	Claim 2-14 depend on the claim 1, therefore inherits the same deficiency.
Claim 16 and 20 are system claim and non-transitory computer storage media claim having similar limitation to the method claim 1. Therefore, they are rejected under the same rationale as claim 1 above. Claim 17-19 depend on the claim 16, therefore inherits the same deficiency.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


	The claim 1-7, 16, 20 is/are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Regarding claim 1, 
2A Prong 1: The limitation of for each candidate architecture in the batch: determining whether the candidate architecture is accepted by all of the classifiers in the sequence of classifiers based on the score labels generated by the sequence of classifiers is a mental process, as it merely recites a candidate going through a sequence of classifiers. In response to a determination that the candidate architecture is accepted by all of the classifiers, adding the candidate architecture to a surviving set of candidate architectures, is a process that covers performance of the limitation in the mind, as it merely recites adding new classifier based on the determination. The limitation of in response to a determination: training a neural network having the candidate architecture on the proxy task is a mental process, as it recites training a neural network using a proxy task in response to result. The limitation of in response to a determination: evaluating performance of the trained neural network on the proxy task to determine a proxy performance metric for the candidate architecture on the proxy task, the proxy performance metric approximating a real performance metric of the candidate architecture on the particular neural network task is a mental process, as it merely recites evaluating the performance of neural networks on the specific task. 
The limitation of selecting based on the proxy performance of the candidate architectures in the surviving set of candidate architecture, a candidate architecture from the surviving set of candidate architectures as the final architecture for the neural network for performing the particular neural network task, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas (MPEP 2106.04(a)(2)(III)). Accordingly, the claim recites an abstract idea. 
	2A Prong 2: The judicial exception is not integrated into a practical application because the claim does not recite anything other than selecting candidate architecture using the sequence of classifiers, which is an abstract idea.  The limitation of maintaining a sequence of classifiers, wherein each classifier in the sequence has been trained to process an input candidate architecture and assign a score label to the input candidate architecture that defines whether the input candidate architecture is accepted or rejected from further consideration, and repeatedly performing the sampling, from a search space defining a plurality of architectures, a batch of candidate architectures for the neural network for performing the particular neural network task are insignificant extra-solution activity (MPEP 2106.05(g)). 
2B: The claim does not recite additional elements that amount to significantly more than the judicial exception. The limitation of wherein the score label assigned to the input candidate architecture is a prediction of how well the input candidate architecture would perform on a proxy task that is less computationally expensive than the particular neural network task merely recites a field of use and technological environment (MPEP 2106.05(h)). The limitation of maintaining a sequence of classifiers, wherein each classifier in the sequence has been trained to process an input candidate architecture and assign a score label to the input candidate architecture that defines whether the input candidate architecture is accepted or rejected from further consideration is mere data storage and gathering (MPEP 2106.05(g)). The limitation of repeatedly performing the sampling, from a search space defining a plurality of architectures, a batch of candidate architectures for the neural network for performing the particular neural network task is well-understood routine and conventional (MPEP 2106.05(d)(I)(2)), as sampling from space is common process as taught by Grinsven which provides Berkheimer evidence ([Grinsven et al, 2016, “Fast Convolutional Neural Network Training Using Selective Data Sampling: Application to Hemorrhage Detection in Color Fundus Images”, page 1273, right column, 2 and 3 paragraph] “CNN training process is a sequential process requiring many iterations (or epochs) to optimize the network parameters and learn discriminative features [2]. In every epoch, a subset of samples is randomly selected from the training data and is presented to the network to update its parameters through backpropagation, minimizing a cost function.” This paragraph discloses the common CNN training process that involves sampling from the dataset.).

	Regarding Claim 16, the claim recites a system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising is a generic computer component, because the claim language merely recites generic computer and storage devices without any detail. Claim 16 is a system claim having similar limitation to method claim 1. Therefore, it is rejected with same rationale as claim 1 above.

	Regarding Claim 20, the claim recites one or more non-transitory computer storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations is a generic computer component, because the claim recites computer storage media without any detail. Claim 20 is a non-transitory computer storage medium claim having similar limitation to method claim 1. Therefore, it is rejected with same rationale as claim 1 above.

	Regarding Claim 2, the limitation of wherein the sequence of classifiers are trained to assign score labels that indicate that input candidate architectures having high proxy performance metrics on the proxy task are accepted and input candidate architectures having low proxy performance metrics on the proxy task are rejected from further consideration is field of use and technological environment (MPEP 2106.05(h)).
The judicial exception is not integrated into a practical application. The claim does not recite additional elements that amount to significantly more than the judicial exception.

	Regarding Claim 3, 
2A Prong 1: The limitation of wherein selecting the candidate architecture from the surviving set of candidate architecture as the final architecture comprises: selecting, from the surviving set of candidate architectures, N candidate architectures having highest proxy performance metrics on the proxy task, is to select the candidate with the highest performance, which corresponds to a mental process that “can be performed in the human mind, or by a human using a pen and paper” (MPEP 2106.04(a)(2)(III)). For example, the process corresponds to an employer selecting N number of candidates by scoring them and selecting who received the highest score. Accordingly, the claim recites an abstract idea. 
The limitation of determining, for each of the N candidate architecture, a respective real performance metric of the candidate architecture on the particular neural network task, is to determine if the candidate architecture satisfies a performance metric, is a mental process that can be performed in the human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(III)). For example, the process corresponds to an employer scores candidates by specific scoring criteria. Accordingly, the claim recites an abstract idea. 
The limitation of selecting the candidate architecture having a highest real performance metric as the final architecture, is a mental process that can be performed in human mind, or by a human using a pen and paper (MPEP 2106.04(a)(2)(III)). For example, the process corresponds to an employer selecting candidate by scoring them and select a candidate who received the highest score. Accordingly, it does not impose any meaningful limits on practicing the abstract idea. 
2A Prong 2: The judicial exception is not integrated into a practical application. The limitation of candidate architecture with high performance metric is accepted and low performance metric is rejected is not integrated into a practical application because the claim does not recite anything other than insignificant extra-solution activity (MPEP 2106.05(g)). The claim is directed to an abstract idea.
2B: The claim does not recite additional elements that amount to significantly more than the judicial exception. Merely providing detail about determining and selecting process cannot provide an inventive concept. The claim is directed to an abstract idea.

	Regarding Claim 4, 
2A Prong 1: The limitation of repeating operation until number of candidate architecture sampled from the search space reaches the predetermined threshold number, is a mental process that “can be performed in the human mind, or by a human using a pen and paper” (MPEP 2106.04(a)(2)(III)), because it merely provides detail about how many times to repeat the specific operation. Accordingly, the claim recites an abstract idea. 
2A Prong 2: The judicial exception is not integrated into a practical application. The claim does not recite practical application because the claim does not recite anything other than providing detail about number of repetitions for the operation, which is an insignificant extra-solution activity (MPEP 2106.05(g)). The claim is directed to an abstract idea. 
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Merely providing detail about repeating the operation cannot provide an inventive concept. The claim is not patent eligible. 

	Regarding Claim 5, 
2A Prong 1: The limitation of repeating operation until surviving set of candidate architecture reaches the predetermined threshold number, is a mental process that “can be performed in the human mind, or by a human using a pen and paper” (MPEP 2106.04(a)(2)(III)), because it merely defines repeating specific operation specific number of times. Accordingly, the claim recites an abstract idea. 
2A Prong 2: The judicial exception is not integrated into a practical application. It merely provides detail about number of repetitions for the operation, which is an insignificant extra-solution activity (MPEP 2106.05(g)). Accordingly, this additional element does not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
2B: The claim does not recite additional elements that amount to significantly more than the judicial exception. Merely providing detail about repeating the operation cannot provide an inventive concept. The claim is not patent eligible.

	Regarding Claim 6, the limitation of wherein sampling, from the search space, the candidate architecture for the neural network for performing the particular neural network task comprises: randomly sampling, from the search space, the candidate architecture for the neural network for performing the particular neural network task is well understood routine and conventional activity, is well-understood routine and conventional, as sampling from space is common process as taught by Grinsven ([Grinsven et al, 2016, “Fast Convolutional Neural Network Training Using Selective Data Sampling: Application to Hemorrhage Detection in Color Fundus Images”, page 1273, right column, 2 and 3 paragraph] “CNN training process is a sequential process requiring many iterations (or epochs) to optimize the network parameters and learn discriminative features [2]. In every epoch, a subset of samples is randomly selected from the training data and is presented to the network to update its parameters through backpropagation, minimizing a cost function.” This paragraph discloses the common CNN training process that involves sampling from the dataset.).
The judicial exception is not integrated into a practical application. The claim does not recite additional elements that amount to significantly more than the judicial exception.

	Regarding Claim 7, the limitation of wherein the score label is a binary score label is field of use and technological environment (MPEP 2106.05(h)). 
The judicial exception is not integrated into a practical application. The claim does not recite additional elements that amount to significantly more than the judicial exception.


	Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 2-3, 7, 16, and 20 are rejected under 35 U.S.C. 103 over Merity (US 20180336453 A1) in view of Cheng (US 9449259 B1), and further in view of Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”).

Regarding claim 1, Merity teaches a method of determining a final architecture for a task neural network for performing a particular neural network task ([Merity, Abstract] “A system automatically generates (i.3. determining a final architecture) recurrent neural network architectures for performing specific tasks”), the method comprising: 
wherein each classifier in the sequence has been trained to process an input candidate architecture and to assign a score label to the input candidate architecture that defines whether the input candidate architecture is accepted or rejected from further consideration, wherein the score label assigned to the input candidate architecture is a prediction of how well the input candidate architecture would perform on a task ([Merity, 0052] “According to an embodiment, the candidate architecture ranking module 320 trains an architecture ranking neural network to predict performance of a given RNN architecture. The candidate architecture ranking module 320 trains the architecture ranking neural network using training data set comprising RNN architectures that were previously evaluated by the candidate architecture evaluation module 330 and their known performance scores. The training dataset may also comprise RNN architectures provided by experts along with their performance scores estimated by experts”, [Merity, 0050, line 12-15] “Accordingly, the results obtained from an RNN generated from the candidate RNN architecture with high performance score have a high likelihood of matching known results, for example, for a labelled dataset”); 
repeatedly performing the following operations: 
sampling, from a search space defining a plurality of architectures, a batch of candidate architectures for the neural network for performing the particular neural network task ([Merity, Figure 7; 0050] The Fig. 7 of Merity discloses the process of receiving a plurality of candidate architectures generated by the candidate architecture generator, and repeating process of providing description of architecture as input to architecture ranking neural network, and generate a performance score for the candidate architecture); 
training a neural network having the candidate on the task, and ([Merity, 0038] “The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specifications to executable code and training each model on the given task. The RNN architecture generator 110 forms training data set comprising architecture-performance pairs based on the result of the training and evaluation. The RNN architecture generator 110 uses the training data set to train the architecture ranking neural network used by the candidate architecture ranking module 420. The training data set can also be used to train the architecture generator neural network further described herein”)
evaluating performance of the trained neural network on the task to determine a performance metric for the candidate architecture on the task, the performance metric approximating a real performance metric of the candidate architecture on the particular neural network task ([Merity, 0037] “The candidate architecture ranking module 420 may measure the performance of a candidate architecture by generating code for an RNN based on the DSL
specification of the architecture and training the RNN. However , this is a slow process . Therefore , the candidate architecture ranking module 420 estimates the performance of candidate architecture by using the architecture ranking neural network.” This paragraph teaches approximating a real performance metric of the candidate architecture.
[Merity, 0038] “The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specification s to executable code and training each model on the given task. The RNN architecture generator 110 forms training data set comprising architecture-performance pairs based on the result of the training and evaluation. The RNN architecture generator 110 uses the training data set to train the architecture ranking neural network used by the candidate architecture ranking module 420. The training data set can also be used to train the architecture generator neural network further described herein” Merity does not specifically teaches the proxy task, but Raposo teaches measuring the performance of the proxy task.);
selecting, based on the performance metrics of the candidate in the surviving set of candidate, a candidate from the surviving set of candidate as the final architecture for the task neural network for performing the particular neural network task ([Merity, claim 1] “A method comprising: generating a plurality of candidate recurrent neural network (RNN) architectures, wherein each candidate RNN architecture is represented using a domain specific language (DSL), wherein the DSL supports a plurality of operators, wherein the representation of a particular candidate RNN architecture comprises one or more operators of the DSL; for each of the plurality of candidate RNN architectures, performing: providing an encoding of the candidate RNN architecture as input to an architecture ranking neural network configured to determine a score for the candidate RNN architecture, the score representing a performance of the candidate RNN architecture for a given particular type of task; executing the ranking neural network to generate a score indicating the performance of the candidate RNN architecture; selecting a candidate RNN architecture based on the scores of each of the plurality of candidate RNN architectures; compiling the selected candidate architecture to generate code representing a target RNN; and executing the code representing the target RNN” The proxy performance metric is taught by the reference Raposo cited below.).
Merity does not specifically discloses maintaining a sequence of classifiers,  for each candidate architecture in the batch: determining whether the candidate architecture is accepted by all of the classifiers in the sequence of classifiers based on the score labels generated by the sequence of classifiers; and in response to a determination that the candidate architecture is accepted by all of the classifiers[[,]]- adding the candidate to a surviving set of candidate,[[; and]] and performance evaluation performed on a proxy task that is less computationally expensive than the particular neural network task.
Cheng teaches maintaining a sequence of classifiers, for each candidate in the batch: determining whether the candidate is accepted by all of the classifiers in the sequence of classifiers based on the score labels generated by the sequence of classifiers ([Cheng, column 7, line 6-27; Fig 3] “In one embodiment illustrated in FIG. 3, there are four different feature generation modules, which are then used by 5 different classifiers to constitute a 5-stage classifier cascade. The execution of the cascade proceeds' by first evaluating the first classifier 314 in the cascade 304. Only the features needed by the first classifier 314 stage are generated. Classification response is denoted as h.sub.n and is used to determine whether to proceed to the next stage of analysis. If the classifier 314 produces a response that exceeds a predefined threshold (i.e., pass), which is tuned during the training of the cascade described below, then the image patch is analyzed further by the second stage 316, and so on. If a feature is used at a later stage, such as the wavelet feature for the 2.sup.nd and 5.sup.th stages, the feature vector is copied rather than recomputed. On the other hand, if the response from the first stage 314 does not exceed the threshold (i.e., fail), the image patch is classified as a non-target and features that have not yet been computed are not computed at all for this image patch. If the image patch survives (i.e., passes) to the last and 5.sup.th stage, the image patch is classified as a target object.” The classifier response which may exceed or not exceed the threshold is interpreted as score label.); in response to a determination that the candidate is accepted by all of the classifiers - adding the candidate to a surviving set of candidate ([Cheng, column 7, line 6-27; Fig 3] “In one embodiment illustrated in FIG. 3, there are four different feature generation modules, which are then used by 5 different classifiers to constitute a 5-stage classifier cascade. The execution of the cascade proceeds' by first evaluating the first classifier 314 in the cascade 304. Only the features needed by the first classifier 314 stage are generated. Classification response is denoted as h.sub.n and is used to determine whether to proceed to the next stage of analysis. If the classifier 314 produces a response that exceeds a predefined threshold (i.e., pass), which is tuned during the training of the cascade described below, then the image patch is analyzed further by the second stage 316, and so on. If a feature is used at a later stage, such as the wavelet feature for the 2.sup.nd and 5.sup.th stages, the feature vector is copied rather than recomputed. On the other hand, if the response from the first stage 314 does not exceed the threshold (i.e., fail), the image patch is classified as a non-target and features that have not yet been computed are not computed at all for this image patch. If the image patch survives (i.e., passes) to the last and 5.sup.th stage, the image patch is classified as a target object.” [Cheng, column 5, line 41-44] “The processor 106 is coupled with a memory 108 to permit storage of data and software that are to be manipulated by commands to the processor 106.” The target object is stored somewhere in memory, which is interpreted as surviving set.).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Cheng into the method of Merity to have wherein using the sequence of classifiers to perform the neural network tasks, and search neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Cheng wherein using the sequence of classifiers to perform neural network tasks, and classify the neural architectures as it is well known in the art to use a sequence of classifiers to search neural architecture, because use of multiple classifiers usually gives better performance on evaluating candidate than using only one classifier as it has more evaluation process, for example, when a company is selecting a candidate, having more than one evaluation process (i.e. classifier) will help reducing the possibility of selecting false-positive (i.e. candidate who is not qualified). 
Merity in view of Cheng failed to teach proxy task that is less computationally expensive than the particular neural network task.
Raposo teaches performance evaluation performed on proxy task that is less computationally expensive than the particular neural network task ([Raposo, page 1, right column, line 13-18] “We evaluate the effectiveness of this model in a transfer learning setting, using it as a feature extractor in a proxy task: music audio-lyrics cross-modal retrieval. We show that the proposed framework is able to achieve very promising results when compared against standard features and a state-of-the-art model, using much less data during training." The model is evaluated on a proxy task and it uses less data during training, which is interpreted as computationally less expensive.
[Raposo, page 4, right column, C. Baselines] “We compare the performance of our 128-dimensional embeddings against two baselines: a 65-dimensional feature vector provided by Spotify and a 160-dimensional embeddings vector from the pre-trained model of [3]. The Spotify set, used before in [28], consists of rhythmic, harmonic, high-level structure, energy, and timbre features. The pre-trained model features are computed by a CNN-based model which was trained on supervised music tags, yet it produces embeddings that have been shown to be state-of-the-art in several tasks [3]. Hereby, we refer to these sets as Spotify and Choi.” This paragraph teaches comparing the performance of pre-trained model and Spotify to see if the proxy task is able to approximate the real performance. The Spotify dataset corresponds to the real performance and pre-trained model corresponds to the proxy task.
[Raposo, page 4, VIII. RESULTS AND DISCUSSION] “Table I shows the MRR results. Our proposed embeddings outperform Spotify, which consists of typical handcrafted features, for this task, by 1.2 percentage points (pp) for instance-based MRR and 1.1 pp for class-based MRR, while performing comparably to Choi, the state-of-the-art embeddings. This is very promissing because Choi’s model is trained on more than 2083 hours of music, whereas our model was trained on less than 3 hours of both music and EEGs. This also means that our model is trained faster. In fact, our model finishes training in about 20 minutes, using an NVIDIA GeForce GTX 1080 graphics card...”).
evaluating performance of the trained neural network on the proxy task to determine a proxy performance metric for the candidate architecture on the proxy task, the proxy performance metric approximating a real performance metric of the candidate architecture on the particular neural network task ([Raposo, page 1, right column, line 13-18] “We evaluate the effectiveness of this model in a transfer learning setting, using it as a feature extractor in a proxy task: music audio-lyrics cross-modal retrieval. We show that the proposed framework is able to achieve very promising results when compared against standard features and a state-of-the-art model, using much less data during training." The model is evaluated on a proxy task and showed very promising result compared to the real result. The neural network of Raposo is interpreted as candidate architecture.
[Raposo, page 4, VIII. RESULTS AND DISCUSSION] “Table I shows the MRR results. Our proposed embeddings outperform Spotify, which consists of typical handcrafted features, for this task, by 1.2 percentage points (pp) for instance-based MRR and 1.1 pp for class-based MRR, while performing comparably to Choi, the state-of-the-art embeddings. This is very promissing because Choi’s model is trained on more than 2083 hours of music, whereas our model was trained on less than 3 hours of both music and EEGs. This also means that our model is trained faster. In fact, our model finishes training in about 20 minutes, using an NVIDIA GeForce GTX 1080 graphics card...” This paragraph discloses the proxy task performs as good as the real task.)
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Raposo into the method of Merity and Cheng to have wherein evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Raposo wherein using the proxy task to evaluate performance of candidate architecture which is classified by a sequence of classifiers as it is well known in the art to use a proxy task to evaluate performance of candidate architecture and indirectly supervise the model to improve the performance.

Regarding claim 2, Merity in view of Cheng teaches the method of claim 1, wherein the sequence of classifiers are trained to assign score labels that indicate that input candidate architectures having high performance metrics on a task are accepted and input candidate architectures having low performance metrics on the task are rejected from further consideration ([Merity, Claim 1] “Selecting a candidate RNN architecture based on the scores(labelled) of each of the plurality of candidate RNN architectures”, selected candidate RNN architecture will be stored in somewhere in memory, which is a surviving set, and [Merity, 0050] “A candidate RNN architecture with high performance score performs the given task with high accuracy. Accordingly, the results obtained from an RNN generated from the candidate RNN architecture with high performance score have a high likelihood of matching known results”, the process selects candidate based on performance scores, and the candidate with higher score is expected to show higher task performance”). 
Merity in view of Cheng failed to teach performing evaluation using a proxy task.
Raposo teaches performing evaluation using a proxy task ([Raposo, page 1, right column, line 13-18] “We evaluate the effectiveness of this model in a transfer learning setting, using it as a feature extractor in a proxy task: music audio-lyrics cross-modal retrieval. We show that the proposed framework is able to achieve very promising results when compared against standard features and a state-of-the-art model, using much less data during training." The model is evaluated on a proxy task and it uses less data during training.). 
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Raposo into the method of Merity and Cheng to have wherein evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Raposo wherein using the proxy task to evaluate performance of candidate architecture which is classified by a sequence of classifiers as it is well known in the art to use a proxy task to evaluate performance of candidate architecture and indirectly supervise the model to improve the performance.

Regarding claim 3, Merity in view of Cheng and further in view of Raposo teaches the method of claim 2, wherein selecting the candidate architecture from the surviving set of candidate architecture as the final architecture comprises: selecting, from the surviving set of candidate architectures, N candidate architectures having highest performance metrics ([Merity, Claim 1] “Selecting a candidate RNN architecture based on the scores(labelled) of each of the plurality of candidate RNN architectures”, selected candidate RNN architecture will be stored in somewhere in memory, which is a surviving set, and [Merity, 0050] “A candidate RNN architecture with high performance score performs the given task with high accuracy. Accordingly, the results obtained from an RNN generated from the candidate RNN architecture with high performance score have a high likelihood of matching known results”, the process selects candidate based on performance scores, and the candidate with higher score is expected to show higher task performance); determining, for each of the N candidate architectures, a respective real performance metric of the candidate architecture on the particular neural network task; and selecting the candidate architecture having a highest real performance metric as the final architecture ([Merity, Claim 1] “Selecting a candidate RNN architecture based on the scores of each of the plurality of candidate RNN architectures”). 
However, Merity in view of Cheng failed to teach proxy performance metrics on proxy task.
Raposo teaches proxy performance metrics on proxy task ([Raposo, page 1, right column, line 13-18] “We evaluate the effectiveness of this model in a transfer learning setting, using it as a feature extractor in a proxy task: music audio-lyrics cross-modal retrieval. We show that the proposed framework is able to achieve very promising results when compared against standard features and a state-of-the-art model, using much less data during training." The model is evaluated on a proxy task.).
The same motivation that was utilized for combining Merity, Cheng, and Raposo as set forth in claim 2 is equally applicable to claim 3.

Regarding claim 16, Merity in view of Cheng, and further in view of Raposo teaches a system comprising one or more computers and one or more storage devices storing instructions ([Merity, claim 11 and 12] “A computer system comprising one or more computer processors and at least one non-transitory storage medium”). Claim 16 is a system claim having similar limitation to method claim 1. Therefore, it is rejected with same rationale as claim 1 above.

Regarding claim 20, Merity in view of Cheng, and further in view of Raposo teaches one or more non-transitory computer storage media storing instructions ([Merity, claim 11 and 12] “A computer system comprising one or more computer processors and at least one non-transitory storage medium”). Claim 20 is a non-transitory computer storage media claim having similar limitation to method claim 1. Therefore, it is rejected with same rationale as claim 1 above.

The claim 4 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Merity (US 20180336453 A1) in view of Cheng (US 9449259 B1), in view of Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”), and further in view of Zhang (CN105046673 A).

Regarding claim 4, Merity in view of Cheng, and further in view of Raposo teaches the method of claim 1. 
However, Merity in view of Cheng, and further in view of Raposo failed to teach wherein repeatedly performing the operations comprises: repeatedly performing the operations until the number of candidate architectures sampled from the search space reaches a first predetermined threshold number.
Zhang teaches wherein repeatedly performing the operations comprises: repeatedly performing the operations until the number of candidate sampled from the search space reaches a first predetermined threshold number ([Zhang, page 5, the last paragraph] “The self-learning algorithm of the invention is suitable for multiple classification models, where the SVM algorithm is used for verification. In order to confirm validity of the algorithm, mainly for small sample experiment, namely, selecting 5 marked sample as an initial training samples, other samples as test samples, using 5 fold cross validation obtaining support vector machine model parameter c and sigma, wherein the distance threshold value delta unmarked sample number to select each selection in the iteration process, and the maximum iteration times is 20. each group of experiments were repeated ten times, namely randomly selecting training sample and testing sample, taking the result average value to obtain the classification precision”, Zhang specifies the number of iterations (i.e. threshold), and it repeatedly performs sampling operations to perform visible light image fusion classification).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Zhang into the method of Merity, Cheng, and Raposo to have wherein using the repeating sampling process to perform the neural network tasks, and search neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Zhang wherein using the process of sampling number of candidates from candidate pool before it reaches threshold number to perform neural network tasks, and classify the neural architectures as it is well known in the art to use a process of repetitively sampling from candidate pool, because repetitively sampling from candidate pool gives the equal opportunity of selection for all of the candidate architectures which prevents biased selection, and select candidates only up to threshold value will prevent overuse of storage devices.

Regarding claim 6, Merity in view of Cheng, and further in view of Raposo teaches the method of claim 1, candidate architecture for the neural network, and neural network tasks. 
Merity in view of Cheng and further in view of Raposo failed to teach wherein sampling, from the search space, the candidate for performing the particular task comprises: randomly sampling, from the search space, the candidate for the neural network for performing the particular task.
Zhang teaches wherein sampling, from the search space, the candidate for performing the particular task comprises: randomly sampling, from the search space, the candidate for performing the particular task ([Zhang, page 5, the last paragraph] “The self-learning algorithm of the invention is suitable for multiple classification models, where the SVM algorithm is used for verification. In order to confirm validity of the algorithm, mainly for small sample experiment, namely, selecting 5 marked sample as an initial training samples, other samples as test samples, using 5 fold cross validation obtaining support vector machine model parameter c and sigma, wherein the distance threshold value delta unmarked sample number to select each selection in the iteration process, and the maximum iteration times is 20. each group of experiments were repeated ten times, namely randomly selecting training sample and testing sample, taking the result average value to obtain the classification precision”, Zhang discloses the random sampling process, and it repeatedly performs sampling operations to perform visible light image fusion classification).
The same motivation that was utilized for combining Merity, Cheng, Raposo, and Zhang as set forth in claim 4 is equally applicable to claim 6.

The claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Merity (US 20180336453 A1) in view of Cheng (US 9449259 B1), in view of Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”), and further in view of Villan (JP 2004536300 A).

Regarding claim 5, Merity in view of Cheng, and further in view of Raposo teaches the method of claim 1. 
Merity in view of Cheng and further in view of Raposo failed to teach wherein repeatedly performing the operations comprises: repeatedly performing the operations until the number of candidate architectures in the surviving set of candidate architectures reaches a second predetermined threshold number.
Villan teaches wherein repeatedly performing the operations comprises: repeatedly performing
the operations until the number of candidates in the surviving set of candidate reaches a second
predetermined threshold number ([Villan, page 11, line 34] “A position determination method in which the step of enlarging is not executed or repeated when the number of indices of the candidate
group after enlarging exceeds a threshold value”, Villan discloses specific task (step of enlarging)
that executes or terminates when number of candidate in a group reaches specific threshold value).
It would have been obvious to a person of ordinary skill in art before the effective filling date of
the claimed invention to implement the function of Villan into the method of Merity, Cheng and Raposo to have wherein using the sampling process to perform the neural network tasks, and search neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Villan wherein performing specific tasks before candidates in the candidate pool reaches specific threshold number, as it is well known in the art to execute or terminate task before number of candidates in the candidate pool reaching threshold, because terminating or executing when reaches threshold value will prevent infinite loop and overuse of computation resources, as suggested by Villan.

The claim 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Merity
(US 20180336453 A1) in view of Cheng (US 9449259 B1), in view of Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”), and further in view of Barreto (Barreto et al, 2004, “Human-robot interaction based on Haar-like features and eigenfaces”).

Regarding claim 7, Merity in view of Cheng, and further in view of Raposo teaches the method of claim 1, 
Barretto teaches wherein the score label is a binary score label ([Barretto, VII. Results] “To train the detector, a set of face and nonface training images were used (i.e. All data are labelled if it is face or nonface, which is binary). The face training set consisted of over 4,000 hand labelled faces scaled and aligned to a base resolution of 24 × 24 pixels. The non-face subwindows used to train the detector come from over 6,000 images which were manually inspected and found to not contain any faces”).
It would have been obvious to a person of ordinary skill in art before the effective filling date of
the claimed invention to implement the function of Barreto into the method of Merity, Cheng and Raposo to use the binary score label of Barreto to implement the method of Merity, Cheng, and Raposo. The motivation is to improve the efficiency of the method, as it is more efficient to use binary number to label the data if the data has only two options (face or not-a-face). 

The claim 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Merity
(US 20180336453 A1) in view of Cheng (US 9449259 B1), in view of Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”), in view of Giacinto (Giacinto, 2001, An approach to the automatic design of multiple classifier systems), and further in view of Ming (CN104731937A).

Regarding claim 8, Merity in view of Cheng teaches the method of claim 1, wherein repeatedly performing the operations further comprises: determining, based on the performance metrics of the candidate architectures in the surviving set of candidate architectures, a respective score label for each candidate architecture in the surviving set of candidate architectures ([Merity, Claim 1] “Selecting a candidate RNN architecture based on the scores (labelled) of each of the plurality of candidate RNN architectures, and selected candidate RNN architecture will be stored in somewhere in memory, which is a surviving set” 
[Merity, 0038] “The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specification s to executable code and training each model on the given task. The RNN architecture generator 110 forms training data set comprising architecture-performance pairs based on the result of the training and evaluation. The RNN architecture generator 110 uses the training data set to train the architecture ranking neural network used by the candidate architecture ranking module 420. The training data set can also be used to train the architecture generator neural network further described herein” This paragraph teaches evaluating the most promising (surviving set) candidate architectures.
[Merity, 0050] “A candidate RNN architecture with high performance score performs the given task with high accuracy. Accordingly, the results obtained from an RNN generated from the candidate RNN architecture with high performance score have a high likelihood of matching known results”, the process selects candidate based on performance scores, and the candidate with higher score is expected to show higher task performance.); and training the classifier on the training data including (ii) a respective score label for each candidate architecture in the surviving set of candidate architectures ([Merity, Claim 1] Selecting a candidate RNN architecture based on the scores of each of the plurality of candidate RNN architectures);
However, Merity in view of Cheng failed to teach proxy performance metric, initializing a new classifier; adding the new classifier to the sequence of classifiers, and training the new classifier on the training data including (i) the surviving set of candidate architectures.
Raposo teaches proxy performance metric ([Raposo, page 1, right column, line 13-18] “We evaluate the effectiveness of this model in a transfer learning setting, using it as a feature extractor in a proxy task: music audio-lyrics cross-modal retrieval. We show that the proposed framework is able to achieve very promising results when compared against standard features and a state-of-the-art model, using much less data during training." The model is evaluated on a proxy task and it uses less data during training, which is interpreted as computationally less expensive.).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Raposo into the method of Merity and Cheng to have wherein evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Raposo wherein using the proxy task to evaluate performance of candidate architecture which is classified by a sequence of classifiers as it is well known in the art to use a proxy task to evaluate performance of candidate architecture and indirectly supervise the model to improve the performance.
However, Merity in view of Cheng and further in view of Raposo failed to teach initializing a new classifier; adding the new classifier to the sequence of classifiers, and training the new classifier on the training data including (i) the surviving set of candidate architectures.
Giacinto teaches initializing a new classifier ([Giacinto, Introduction. page 26, and 3.1 The proposed approach. page 27] “In particular, instead of attempting to design an effective classifier ensemble directly, a large set of ``error-diverse'' but also ``error-correlated'' classifiers is initially created”, “In addition, the overproduce and choose strategy allows one to exploit effectively all the available methods for the creation of a set of ``candidate'' classifiers (see Section 2). The choice phase is then aimed at selecting the most accurate and diverse classifiers as MCS members … Let C be the initial ensemble of N classifiers created by the overproduction phase: C = {C1, C2, . . ., Cn}”, Giacinto discloses the process of creating initial ensemble of N classifiers by the overproduction phase, and the classifiers initialized to have “error-diverse” and “error-correlated” features); adding the new classifier to the sequence of classifiers ([Giacinto, Abstract, and 3.2.1] Adding and subtracting classifier is a common practice in the art of multiple classifier systems. Giacinto discloses the method of selecting and adding the classifiers from a large set of classifiers). 
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Giacinto into the method of Merity, Cheng, and Raposo to have wherein adding new classifier into a sequence of classifiers, evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Giacinto wherein using the adding new classifier into a sequence of classifiers to improve performance of the neural architecture search as it is well known in the art to having multiple classifiers to evaluate candidates multiple times improves the classification performance, for example, adding more steps in employment process will increase the possibility of selecting better candidate.
Merity in view of Cheng, in view of Raposo, and further in view of Giacinto failed to teach training new classifier on training data including (i) the surviving set of candidate architectures.
Ming teaches training the new classifier on the training data including (i) the surviving set of candidate ([Ming, 0071] “It should be noted that each training sample, contained in the training sample set is a known sample, thus, it can directly use the known samples for training to construct a classifier; ... to build a new classifier, until the stop condition of the classifier or classifier constructed of known sample meet. The classification accuracy rate is greater than or equal to the preset accuracy threshold value or the number of known sample number greater than or equal to the preset threshold. this embodiment is not particularly limited to this”, Ming discloses the process of inputting prepared candidate (i.e. surviving set of candidates) into a new machine learning model).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Ming into the method of Merity, Cheng, Raposo, and Giacinto to have wherein training new classifier using pre-processed candidate, evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures.
The modification would have been obvious because one of the ordinary skills of the art would implement the function of Ming wherein using the training new classifier using pre-processed candidates to improve performance of the candidate classification that are used in neural architecture search as it is well known in the art to prepare data before using them to train machine learning models.

Regarding claim 17, Merity in view of Cheng teaches the system of claim 16, wherein repeatedly performing the operations further comprises: determining, based on the performance metrics of the candidate architectures in the surviving set of candidate architectures, a respective score label for each candidate architecture in the surviving set of candidate architectures ([Merity, Claim 1] “Selecting a candidate RNN architecture based on the scores (labelled) of each of the plurality of candidate RNN architectures, and selected candidate RNN architecture will be stored in somewhere in memory, which is a surviving set” 
[Merity, 0038] “The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specification s to executable code and training each model on the given task. The RNN architecture generator 110 forms training data set comprising architecture-performance pairs based on the result of the training and evaluation. The RNN architecture generator 110 uses the training data set to train the architecture ranking neural network used by the candidate architecture ranking module 420. The training data set can also be used to train the architecture generator neural network further described herein” This paragraph teaches evaluating the most promising (surviving set) candidate architectures.
[Merity, 0050] “A candidate RNN architecture with high performance score performs the given task with high accuracy. Accordingly, the results obtained from an RNN generated from the candidate RNN architecture with high performance score have a high likelihood of matching known results”, the process selects candidate based on performance scores, and the candidate with higher score is expected to show higher task performance.); and (ii) a respective score label for each candidate architecture in the surviving set of candidate architectures ([Merity, Claim 1] “Selecting a candidate RNN architecture based on the scores of each of the plurality of candidate RNN architectures”);
However, Merity in view of Cheng failed to teach proxy performance metric, initializing a new classifier; adding the new classifier to the sequence of classifiers, and training the new classifier on the training data including (i) the surviving set of candidate architectures.
Raposo teaches proxy performance metric ([Raposo, page 1, right column, line 13-18] “We evaluate the effectiveness of this model in a transfer learning setting, using it as a feature extractor in a proxy task: music audio-lyrics cross-modal retrieval. We show that the proposed framework is able to achieve very promising results when compared against standard features and a state-of-the-art model, using much less data during training." The model is evaluated on a proxy task and it uses less data during training.)
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Raposo into the system of Merity and Cheng to have wherein evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Raposo wherein using the proxy task to evaluate performance of candidate architecture which is classified by a sequence of classifiers as it is well known in the art to use a proxy task to evaluate performance of candidate architecture and indirectly supervise the model to improve the performance.
Merity in view of Cheng, and further in view of Raposo failed to teach initializing a new classifier; adding the new classifier to the sequence of classifiers, and training the new classifier on the training data including (i) the surviving set of candidate architectures.
Giacinto teaches initializing a new classifier ([Giacinto, Introduction. page 26, and 3.1 The proposed approach. page 27] “In particular, instead of attempting to design an effective classifier ensemble directly, a large set of ``error-diverse'' but also ``error-correlated'' classifiers is initially created”, “In addition, the overproduce and choose strategy allows one to exploit effectively all the available methods for the creation of a set of ``candidate'' classifiers (see Section 2). The choice phase is then aimed at selecting the most accurate and diverse classifiers as MCS members … Let C be the initial ensemble of N classifiers created by the overproduction phase: C = {C1, C2, . . ., Cn}”, Giacinto discloses the process of creating initial ensemble of N classifiers by the overproduction phase, and the classifiers initialized to have “error-diverse” and “error-correlated” features); adding the new classifier to the sequence of classifiers ([Giacinto, Abstract, and 3.2.1] Adding and subtracting classifier is a common practice in the art of multiple classifier systems. Giacinto discloses the method of selecting and adding the classifiers from a large set of classifiers).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Giacinto into the system of Merity, Cheng, and Raposo to have wherein adding new classifier into a sequence of classifiers, evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Giacinto wherein using the adding new classifier into a sequence of classifiers to improve performance of the neural architecture search as it is well known in the art to having multiple classifiers to evaluate candidates multiple times improves the classification performance, for example, adding more steps in employment process will increase the possibility of selecting better candidate.
Merity in view of Cheng, in view of Raposo, in view of Giacinto failed to teach training the new classifier on the training data including (i) the surviving set of candidate architectures.
Ming teaches training the new classifier on the training data including (i) the surviving set of candidate architectures ([Ming, 0071] “It should be noted that each training sample, contained in the training sample set is a known sample, thus, it can directly use the known samples for training to construct a classifier; ... to build a new classifier, until the stop condition of the classifier or classifier constructed of known sample meet. The classification accuracy rate is greater than or equal to the preset accuracy threshold value or the number of known sample number greater than or equal to the preset threshold. this embodiment is not particularly limited to this”, Ming discloses the process of inputting prepared candidate (i.e. surviving set of candidates) into a new machine learning model).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Ming into the system of Merity, Cheng, Raposo, and Giacinto to have wherein training new classifier using pre-processed candidate, evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Ming wherein using the training new classifier using pre-processed candidates to improve performance of the candidate classification that are used in neural architecture search as it is well known in the art to prepare data before using them to train machine learning models.

Claim 9, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Merity (US 20180336453 A1) in view of Cheng (US 9449259 B1), in view of Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”), in view of Giacinto (Giacinto, 2001, An approach to the automatic design of multiple classifier systems), in view of Ming (CN104731937A), and further in view of Nurdan (Nurdan et al. 2011, a mineral classification system with multiple artificial neural network using k-fold cross validation).

Regarding claim 9, Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming teaches the method of claim 8, and only training the new classifier on the training data when the validation accuracy exceeds the accuracy threshold ([Merity, 0037-0038] “The candidate architecture ranking module 420 ranks each candidate architecture to predict performance of the candidate architecture …Therefore, the candidate architecture ranking module 420 estimates the performance of candidate architecture by using the architecture ranking neural network … The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specifications to executable code and training each model on the given task”, Merity estimates the performance of candidate architecture before training, because training is a slow process). further comprising: determining accuracy of k classifiers on the training data ([Merity, 0038] “The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specifications to executable code and training each model on the given task … The RNN architecture generator 110 uses the training data set to train the architecture ranking neural network used by the candidate architecture ranking module 420”, The candidate architecture evaluation module in Merity determines accuracy of models (i.e. classifiers) on the training data); determining whether the accuracy of the k classifiers exceeds an accuracy threshold ([Merity, 0038] “The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specifications to executable code and training each model on the given task … The RNN architecture generator 110 uses the training data set to train the architecture ranking neural network used by the candidate architecture ranking module 420”, The candidate architecture evaluation module in Merity determines accuracy of models (i.e. classifiers) on the training data).
However, Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming failed to teach k-fold cross validation.
However, Nurdan teaches further comprising: determining a k-fold cross validation accuracy of k classifiers on the training data ([Nurdan, Paragraph 2.3] “In the training of neural network, k-fold cross validation is used to make the test result more meaningful and reliable. In k-fold cross-validation, the whole original data is randomly partitioned into k equal size sub samples. Of the k sub samples, in each case, each of the k sub samples is used as validation data and the remaining is used for training … Of the k sub samples, in each case, each of the k sub samples is used as validation data and the remaining is used for training. The cross-validation process is then repeated k times (the folds). The average of k results from the folds gives the test accuracy of the algorithm”).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Nurdan into the method of Merity, Cheng, Raposo, Giacinto, and Ming to have wherein determining k-fold cross validation accuracy of the classifiers, adding new classifier into a sequence of classifiers, evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures. The suggestion and/or motivation for doing so is to estimate the accuracy of the multi-classifier neural architecture search, as suggested by Nurdan.

Regarding claim 11, Merity in view of Cheng, in view of Raposo, in view of Giacinto, in view of Ming, and further in view of Nurdan teaches the method of claim 9, wherein k is a predetermined integer ([Nurdan, Paragraph 2.3] “In the training of neural network, k-fold cross validation is used to make the test result more meaningful and reliable. In k-fold cross-validation, the whole original data is randomly partitioned into k equal size sub samples. Of the k sub samples, in each case, each of the k sub samples is used as validation data and the remaining is used for training”).
The same motivation that was utilized for combining Merity, Cheng, Raposo, Giacinto, Ming and Nurdan as set forth in claim 9 is equally applicable to claim 11.

Regarding claim 18, Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming teaches the system of claim 17, wherein the operations further comprise: determining an accuracy of k classifiers on the training data ([Merity, 0037-0038] “The candidate architecture ranking module 420 ranks each candidate architecture to predict performance of the candidate architecture …Therefore, the candidate architecture ranking module 420 estimates the performance of candidate architecture by using the architecture ranking neural network … The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specifications to executable code and training each model on the given task”, Merity estimates the performance of candidate architecture before training, because training is a slow process); determining whether the validation accuracy of the k classifiers exceeds an accuracy threshold ([Merity, 0038] “The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specifications to executable code and training each model on the given task … The RNN architecture generator 110 uses the training data set to train the architecture ranking neural network used by the candidate architecture ranking module 420”, The candidate architecture evaluation module in Merity determines accuracy of models (i.e. classifiers) on the training data”); and only training the new classifier on the training data when the validation accuracy exceeds the accuracy threshold ([Merity, 0037-0038] “The candidate architecture ranking module 420 ranks each candidate architecture to predict performance of the candidate architecture …Therefore, the candidate architecture ranking module 420 estimates the performance of candidate architecture by using the architecture ranking neural network … The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specifications to executable code and training each model on the given task”, Merity estimates the performance of candidate architecture before training, because training is a slow process). 
Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming failed to teach k-fold cross validation accuracy.
However, Nurdan teaches k-fold cross validation ([Nurdan, Abstract] “To select training and test data, 5-fold-cross validation method was involved and multi-layer perceptron neural network (MLPNN) with one hidden layer was employed for classification”) and determining whether the k-fold cross validation accuracy of the k classifiers exceeds an accuracy threshold ([Nurdan, Abstract] “To select training and test data, 5-fold-cross validation method was involved and multi-layer perceptron neural network (MLPNN) with one hidden layer was employed for classification” To test accuracy of classifiers, checking if the result of k-fold cross validation satisfies the accuracy must be involved).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Nurdan into the system of Merity, Cheng, Raposo, and Giacinto to have wherein determining k-fold cross validation accuracy of the classifiers, adding new classifier into a sequence of classifiers, evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures. The suggestion and/or motivation for doing so is to estimate the accuracy of the multi-classifier neural architecture search, as suggested by Nurdan.

Claim 10, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Merity
(US 20180336453 A1) in view of Cheng (US 9449259 B1), in view of Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”), in view of Giacinto (Giacinto, 2001, An approach to the automatic design of multiple classifier systems), in view of Ming (CN104731937A), in view of Nurdan (Nurdan et al. 2011, a mineral classification system with multiple artificial neural network using k-fold cross validation), and further in view of Villan (JP2004536300A).

Regarding claim 10, Merity in view of Cheng, in view of Raposo, in view of Giacinto, in view of
Ming, and further in view of Nurdan teaches the method of claim 9, further comprising: only training the
new classifier on the training data when the accuracy exceeds the accuracy threshold ([Merity, 0046]
“Accordingly, if the candidate architecture generator 310 determines that the size of the partial
RNN architecture is less than (or equal to) the threshold value, the candidate architecture
generator 310 repeats the steps 620, 630, 640, 650, and 660 to add other operators to the partial
RNN architecture”). 
However, Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming failed to teach k-fold cross validation, and when the number of candidate architectures in the surviving set exceeds a threshold number.
Nurdan teaches k-fold cross validation ([Nurdan, Paragraph 2.3] “In the training of neural
network, k-fold cross validation is used to make the test result more meaningful and reliable. In k-
fold cross-validation, the whole original data is randomly partitioned into k equal size sub samples.
Of the k sub samples, in each case, each of the k sub samples is used as validation data and the
remaining is used for training”). 
However, Merity in view of Cheng, in view of Raposo, in view of Giacinto, in view of Ming, and further in view of Nurdan failed to teach when the number of candidate architectures in the surviving set exceeds a threshold number.
The same motivation that was utilized for combining Merity, Cheng, Raposo, Giacinto, Ming, and Nurdan as set forth in claim 9 is equally applicable to claim 10.
Villan teaches when the number of candidate architectures in the surviving set exceeds a
threshold number ([Villan, page 11, line 34] “A position determination method in which the step of
enlarging is not executed or repeated when the number of indices of the candidate group after
enlarging exceeds a threshold value”, Villan discloses specific task (step of enlarging) that executes
or terminates when number of candidate in a group reaches specific threshold value).
It would have been obvious to a person of ordinary skill in art before the effective filling date of
the claimed invention to implement the function of Villan into the method of Merity, Cheng, Raposo, Giacinto, Ming, and Nurdan to have wherein using the sampling process to perform the neural network
tasks, and search neural architectures. The modification would have been obvious because one of the
ordinary skills of the art would implement the function of Villan wherein performing specific tasks before candidates in the candidate pool reaches specific threshold number, as it is well known in the art to execute or terminate task before number of candidates in the candidate pool reaching threshold, because terminating or executing when reaches threshold value will prevent infinite loop and overuse of
computation resources, as suggested by Villan.

Regarding claim 19, Merity in view of Cheng, in view of Raposo, in view of Giacinto, in view of
Ming, and further in view of Nurdan teaches the system of claim 18, wherein the operations further
comprise: only training the new classifier on the training data when the accuracy exceeds the accuracy
threshold ([Merity, 0037-0038] “The candidate architecture ranking module 420 ranks each candidate architecture to predict performance of the candidate architecture …Therefore, the candidate architecture ranking module 420 estimates the performance of candidate architecture by using the architecture ranking neural network … The candidate architecture evaluation module 440 evaluates 430 the most promising candidate architectures by compiling their DSL specifications to executable code and training each model on the given task”, Merity estimates the performance of candidate architecture before training, because training is a slow process). 
However, Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming failed to teach k-fold cross validation, and when the number of candidate architectures in the surviving set exceeds a threshold number. The same motivation that was utilized for combining Merity, Cheng, Raposo, Giacinto, and Ming, as set forth in claim 17 is equally applicable to claim 19.
Nurdan teaches k-fold cross validation ([Nurdan, Paragraph 2.3] “In the training of neural
network, k-fold cross validation is used to make the test result more meaningful and reliable. In k-
fold cross-validation, the whole original data is randomly partitioned into k equal size sub samples.
Of the k sub samples, in each case, each of the k sub samples is used as validation data and the
remaining is used for training”). 
However, Merity in view of Cheng, in view of Raposo, in view of Giacinto, in view of Ming, and further in view of Nurdan failed to teach when the number of candidate architectures in the surviving set exceeds a threshold number.
The same motivation that was utilized for combining Merity, Cheng, Raposo, Giacinto, Ming, and Nurdan as set forth in claim 18 is equally applicable to claim 19.
Villan teaches and when the number of candidates in the surviving set exceeds a threshold
number ([Villan, page 11, line 34] “A position determination method in which the step of enlarging
is not executed or repeated when the number of indices of the candidate group after enlarging exceeds a threshold value”, Villan discloses specific task (step of enlarging) that executes or terminates when number of candidate in a group reaches specific threshold value).
It would have been obvious to a person of ordinary skill in art before the effective filling date of
the claimed invention to implement the function of Villan into the method of Merity, Cheng, Raposo,
Giacinto, Ming, and Nurdan to have wherein using the sampling process to perform the neural network
tasks, and search neural architectures. The modification would have been obvious because one of the
ordinary skills of the art would implement the function of Villan wherein performing specific tasks before candidates in the candidate pool reaches specific threshold number, as it is well known in the art to execute or terminate task before number of candidates in the candidate pool reaching threshold, because terminating or executing when reaches threshold value will prevent infinite loop and overuse of
computation resources, as suggested by Villan.

Claim 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Merity (US 20180336453 A1) in view of Cheng (US 9449259 B1), in view of Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”), in view of Giacinto (Giacinto, 2001, An approach to the automatic design of multiple classifier systems), in view of Ming (CN104731937A), and further in view of Nakamalu (JP2017532959).

Regarding claim 12, Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming teaches the method of claim 8, wherein determining, based on the performance metrics of the candidate architectures ([Merity, 0051] “The candidate architecture ranking module selects a subset of the plurality of candidate RNN architectures based on their performance scores”), a respective score label for each candidate architecture in the surviving set of candidate architectures comprises ([Merity, Claim 1] Selecting a candidate RNN architecture based on the scores of each of the plurality of candidate RNN architectures): 
Merity in view of Cheng failed to teach proxy performance metric.
Raposo teaches proxy performance metric ([Raposo, page 1, right column, line 13-18] “We evaluate the effectiveness of this model in a transfer learning setting, using it as a feature extractor in a proxy task: music audio-lyrics cross-modal retrieval. We show that the proposed framework is able to achieve very promising results when compared against standard features and a state-of-the-art model, using much less data during training." The model is evaluated on a proxy task and it uses less data during training.).
The same motivation that was utilized for combining Merity, Cheng, Raposo, and Giacinto as set forth in claim 8 is equally applicable to claim 12.
Merity in view of Cheng, in view of Raposo failed to teach determining a median value of the proxy performance metrics of the candidate architectures in the surviving set of candidate; and comparing the performance of the current candidate architecture with the median value to determine the respective score label for the current candidate.
Nakamalu teaches determining a median value of the performance metrics of the candidate in the surviving set of candidate architectures ([Nakamalu, Page 6, the 4th paragraph of the page] “In some cases, the normalized expression or aggregate value of the gene or gene signature (i.e. score label) is compared to the normalized expression median of the gene or gene signature, or to the aggregate value, for a set of cancer or cancer type”, Nakamalu discloses the process of getting median value of gene signatures); and comparing the performance of the current candidate with the median value to determine the respective score label for the current candidate ([Nakamalu, Page 6, the 4th paragraph of the page] “In some cases, the normalized expression or aggregate value of the gene or gene signature (i.e. score label) is compared to the normalized expression median of the gene or gene signature, or to the aggregate value, for a set of cancer or cancer type”, Nakamalu discloses the process of comparing gene signature value to the median value of signatures).
It would have been obvious to a person of ordinary skill in art before the effective filling date of the claimed invention to implement the function of Nakamalu into the method of Merity, Cheng, Raposo, Giacinto, and Ming to have wherein calculating median value of candidates, comparing it with current candidate, adding new classifier into a sequence of classifiers, evaluation using a proxy task into the sequence of classifiers to perform the neural network tasks, and search neural architectures. The modification for doing so is to compare general performance of candidates with the score of the current candidate.

Regarding claim 13, Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming teaches the method of claim 12.
Raposo teaches the proxy performance metric ([Raposo, page 1, right column, line 13-18] “We evaluate the effectiveness of this model in a transfer learning setting, using it as a feature extractor in a proxy task: music audio-lyrics cross-modal retrieval. We show that the proposed framework is able to achieve very promising results when compared against standard features and a state-of-the-art model, using much less data during training." The model is evaluated on a proxy task and it uses less data during training.). 
However, Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming failed to teach wherein comparing the performance metric of the current candidate with the median value to determine the respective score label for the current candidate comprises: when the performance metric of the current candidate is below the median value, assigning a negative score label to the current candidate; and when the performance metric of the current candidate is equal or above the median value, assigning a positive score label to the current.
Nakamalu teaches wherein comparing the performance metric of the current candidate with the median value to determine the respective score label for the current candidate comprises ([Nakamalu, Page 6, the 4th paragraph of the page] “In some cases, the normalized expression or aggregate value of the gene or gene signature (i.e. score label) is compared to the normalized expression median of the gene or gene signature, or to the aggregate value, for a set of cancer or cancer type”, Nakamalu discloses the process of comparing gene signature value (i.e. score label) to the median value of signatures): when the performance metric of the current candidate is below the median value, assigning a negative score label to the current candidate architecture ([Nakamalu, Page 6, the 4th paragraph of the page] “In some cases, the normalized expression or aggregate value of the gene or gene signature (i.e. score label) is compared to the normalized expression median of the gene or gene signature, or to the aggregate value, for a set of cancer or cancer type And then it is determined to be increased or decreased”, Nakamalu discloses the process of determining if the gene signature value is increased or decreased, which corresponds to positive and negative labeling); and when the performance metric of the current candidate architecture is equal or above the median value, assigning a positive score label to the current architecture ([Nakamalu, Page 6, the 4th paragraph of the page] “In some cases, the normalized expression or aggregate value of the gene or gene signature (i.e. score label) is compared to the normalized expression median of the gene or gene signature, or to the aggregate value, for a set of cancer or cancer type And then it is determined to be increased or decreased”, Nakamalu discloses the process of determining if the gene signature value is increased or decreased, which corresponds to positive and negative labeling).
The same motivation that was utilized for combining Merity, Cheng, Raposo, Giacinto, and Nakamalu as set forth in claim 12 is equally applicable to claim 13.

Claim 14 is rejected under 35 U.S.C. 103 over Merity (US 20180336453 A1) in view of Cheng
(US 9449259 B1), in view of Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”), in view of Giacinto (Giacinto, 2001, An approach to the automatic design of multiple classifier systems), in view of Ming (CN104731937A), and further in view of Ye (Ye et al. 2009, Stochastic gradient boosted distributed decision trees).

Regarding claim 14, Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming teaches the method of claim 8. 
Merity in view of Cheng, in view of Raposo, in view of Giacinto, and further in view of Ming failed to teach wherein training the new classifier comprises: training the new classifier using a gradient boosted trees method.
However, Ye teaches wherein training the new classifier comprises: training the new classifier
using a gradient boosted trees method ([Ye, Instruction] “Gradient tree boosting constructs an
additive regression model, utilizing decision trees as the weak learner”).
It would have been obvious to a person of ordinary skill in art before the effective filling date of
the claimed invention to implement the function of Ye into the method of Merity, Cheng, Raposo, Giacinto, and Ming to have wherein training classifiers using gradient boosted trees method into adding the new classifier to a sequence of classifiers to improve neural architecture search performance, evaluation of candidate architectures using a proxy task, and searching neural architectures. The modification would have been obvious because one of the ordinary skills of the art would implement the function of Ye wherein training classifiers using gradient boosted trees method to improve performance of the neural architecture search as it is well known in the art to use gradient boosted trees method to combine weak learners (i.e. classifiers before training) to construct stronger learner (i.e. classifiers after training).

Response to Argument
Applicant’s arguments filed 08/29/2022 have been fully considered but they are not persuasive.
Applicant’s respectfully argues that the 101 rejection is inapplicable to the claims because the amended limitation discloses a significant technological improvement compared to the existing technology. 
Examiner respectfully disagrees with the Applicant’s statement. Entire claim language of claim 1 merely discloses the process of collecting candidate architectures and processing candidate architectures using a sequence of classifiers, which are mental process. The disclosed hardware merely shows the generic processors and memories without details, which are generic computer component. The claim merely recites selecting the better candidate architecture among a set of candidate architectures, which is already been used commonly in the art as shown in Merity (US 20180336453 A1) thus it does not provide technical improvements over existing methods. Therefore, the claims are not patentable. 
Furthermore, the amended limitation the proxy performance metric approximating a real performance metric of the candidate architecture on the particular neural network task is also a mental process, as it merely recites approximating a performance of specific architecture by using a neural network which is a technological field of use (MPEP 2106.05(h)). 
Applicant’s arguments with 35 U.S.C. 103 prior arts respect to claim(s) 1-20 have been considered but not persuasive. 
The applicant respectfully argues that the 35 U.S.C. 103 have to be withdrawn as the cited art does not teach or suggest “training a neural network having the candidate architecture on the proxy task” and “evaluating performance of the trained neural network on the proxy task to determine a proxy performance metric for the candidate architecture on the proxy task, the proxy performance metric approximating a real performance metric of the candidate architecture on the particular neural network task”.
The examiner respectfully disagrees. The paragraph 0037 and 0038 of Merity discloses the candidate architecture ranking module 420 which estimates the real performance of candidate architecture by using a neural network, and the candidate architecture evaluation module 440 which evaluates the candidate architectures. The new reference Cheng (US 9449259 B1) teaches the sequence of classifiers. The limitation of determining proxy performance metric is being taught by newly added reference Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”).
The applicant also argues that the ‘embeddings’ of the reference Köhn is an additional feature, not a neural network, and Köhn does not teach evaluating candidate architecture.
The limitation of evaluating candidate architecture is taught by the combination of Merity and the new reference Cheng. The combination of Merity and Cheng teaches the process of evaluating candidate architectures in the paragraph 0038 of Merity. The reference Köhn, which has been replaced by a new reference Raposo, is only used to clearly disclose the process of using the ‘proxy task’ to evaluate the model.
Applicant’s arguments with respect to claim(s) 1-20 regarding the reference Köhn have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The new reference Raposo (Raposo et al, 2017, “Towards Deep Modeling of Music Semantics using EEG Regularizers”) is used to reject the limitations and disclose the process of evaluating effectiveness of neural network model. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Regarding proxy performance metric.
Schulman et al, 2017, “Proximal Policy Optimization Algorithms”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached M-F 7:30AM-4:00PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/JUN KWON/
Patent Examiner, Art Unit 2127

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127