DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The disclosure is objected to because of the following informalities:
  In paragraph 0032, line 3, “tje classification” should read “the classification”.
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: “a unit”, “a control unit”, “a training unit”, “a reward calculation unit”, and “an adjustment unit” in claim 1.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6 and 13 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 6 recites the limitation “the time of iteration termination” in line 2.  There is insufficient antecedent basis for this limitation in the claim.  For examination purposes, the term “the time of iteration termination” will be interpreted to mean a time when the predetermined iteration termination condition is satisfied.
Claim 13 recites the limitation “the time of iteration termination” in line 2.  There is insufficient antecedent basis for this limitation in the claim.  For examination purposes, the term “the time of iteration termination” will be interpreted to mean a time when the predetermined iteration termination condition is satisfied.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim does not fall within at least one of the four categories of patent eligible subject matter because the broadest reasonable interpretation of “a computer readable recording medium” can encompass non-statutory transitory forms of signal transmission, such as a propagating electrical or electromagnetic signal per se.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 7 – 8, 11 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Zoph et al. ("Neural Architecture Search with Reinforcement Learning"), hereinafter Zoph, in view of Qi et al. (“Contrastive-center Loss for Deep Neural Networks"), hereinafter Qi, and Wan et al. ("Rethinking Feature Distribution for Loss Functions in Image Classification"), hereinafter Wan.
Regarding claim 1, Zoph discloses a neural network architecture search apparatus, comprising:
a unit for defining search space for neural network architecture, configured to define a search space used as a set of architecture parameters describing the neural network architecture (Section 4.1, lines 5-7, "Search space: Our search space consists of convolutional architectures, with rectified linear units as non-linearities (Nair & Hinton, 2010), batch normalization (Ioffe & Szegedy, 2015) and skip connections between layers (Section 3.3).");
a control unit configured to perform sampling on the architecture parameters in the search space based on parameters of the control unit, to generate at least one sub-neural network architecture (Section 3.1, lines 1-4, "In Neural Architecture Search, we use a controller to generate architectural hyperparameters of neural networks. To be flexible, the controller is implemented as a recurrent neural network. Let’s suppose we would like to predict feedforward neural networks with only convolutional layers, we can use the controller to generate their hyperparameters as a sequence of tokens"; Figure 2, "How our controller recurrent neural network samples a simple convolutional network. It predicts filter height, filter width, stride height, stride width, and number of filters for one layer and repeats. Every prediction is carried out by a softmax classifier and then fed into the next time step as input."; Section 4.1, lines 17-18, "Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs.");
a training unit configured to, by utilizing all samples in a training set, with respect to each sub-neural network architecture of the at least one sub-neural network architecture (Section 4.2, lines 22-30, "In our experiments, every child model is constructed and trained for 35 epochs. Every child model has two layers, with the number of hidden units adjusted so that total number of learnable parameters approximately match the “medium” baselines (Zaremba et al., 2014; Gal, 2015). In these experiments we only have the controller predict the RNN cell structure and fix all other hyperparameters. The reward function is c/(validation perplexity)2 where c is a constant, usually set at 80. After the controller RNN is done training, we take the best RNN cell according to the lowest validation perplexity and then run a grid search over learning rate, weight initialization, dropout rates and decay epoch. The best cell found was then run with three different configurations and sizes to increase its capacity.");
a reward calculation unit configured to, by utilizing all samples in a validation set, with respect to each sub-neural network architecture having been trained, respectively calculate a classification accuracy (Section 3.2, lines 1-4, "The list of tokens that the controller predicts can be viewed as a list of actions a1:T to design an architecture for a child network. At convergence, this child network will achieve an accuracy R on a held-out dataset. We can use this accuracy R as the reward signal and use reinforcement learning to train the controller."; Section 4, line 1, "We apply our method to an image classification task with CIFAR-10"),
and to calculate, based on the classification accuracy and the feature distribution score of each sub-neural network architecture, a reward score of each sub-neural network architecture (Section 4.1, lines 18-19, "The reward used for updating the controller is the maximum validation accuracy of the last 5 epochs cubed."),
and an adjustment unit configured to feed back the reward score to the control unit, and to cause the parameters of the control unit to be adjusted towards a direction in which the reward scores of the at least one sub-neural network architecture are larger, wherein processing in the control unit, the training unit, the reward calculation unit and the adjustment unit are performed iteratively, until a predetermined iteration termination condition is satisfied (Section 1, lines 10-16, "Our work is based on the observation that the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string. Training the network specified by the string – the “child network” – on the real data will result in an accuracy on a validation set. Using this accuracy as the reward signal, we can compute the policy gradient to update the controller. As a result, in the next iteration, the controller will give higher probabilities to architectures that receive high accuracies. In other words, the controller will learn to improve its search over time."; Section 4.2, line 63, "Training is stopped at 800K steps.").
Zoph does not specifically disclose: calculate an inter-class loss indicating a separation degree between features of samples of different classes and a center loss indicating an aggregation degree between features of samples of a same class, and to perform training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss; calculate a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class.
Qi teaches:
calculate an inter-class loss indicating a separation degree between features of samples of different classes and a center loss indicating an aggregation degree between features of samples of a same class, and to perform training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness."; Section 1, lines 45-51, "In this paper, we propose the contrastive-center loss, which learns a center for each class. This new loss will simultaneously consider intra-class compactness and inter-class separability by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers."; Section 2, lines 5-7, "Using softmax loss assisted with our contrastive-center loss to train a deep neural network will really boost the performance of the network.").
Qi teaches calculating inter-class loss and center loss and using a loss function that includes the inter-class loss and center loss to train a neural network in order to enhance the discriminative power of features for training a neural network (Abstract, lines 1-12, "The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers.").
Zoph and Qi are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph to incorporate the teachings of Qi to calculate inter-class loss and center loss and use a loss function that includes the inter-class loss and center loss to train a neural network.  Doing so would allow for enhancing the discriminative power of features for training a neural network.
Zoph in view of Qi does not specifically disclose: calculate a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class.
Wan teaches:
calculate a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class (Abstract, lines 1-2, "We propose a large-margin Gaussian Mixture (L-GM) loss for deep neural networks in classification tasks."; Section 1, lines 70-73, "the GM loss can be readily used to estimate the likelihood of an input to the learned training feature distribution, leading to the possibility of improving the model’s robustness, for example, towards adversarial examples."; Section 1, lines 29-36, "under the softmax loss formulation, the cosine distance based similarity metrics is more appropriate, indicating that using the Euclidean distance based additional losses may not be the most ideal choice. Based on this understanding, an angular distance based margin is introduced in [22] to force extra intra-class compactness and inter-class separability, leading to better generalization of the trained models.").
Wan teaches determining classification accuracy and evaluating feature distribution that indicates intra-class compactness in order to improve the capability of a trained neural network (Section 5, lines 1-10, "We proposed a loss function by assuming a Gaussian Mixture (GM) distribution of the deep features on the training set. Besides the classification loss, a log likelihood regularization term is added to explicitly drive the deep model for generating GM distributed features. To further improve the generalization capability of the trained model, a classification margin is introduced. Extensive experiments demonstrate that the proposed L-GM loss outperforms the softmax loss and its variants in in both small and large-scale datasets when combined with different deep models.").
Zoph, Qi, and Wan are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi to incorporate the teachings of Wan to determine classification accuracy and evaluate feature distribution that indicates intra-class compactness.  Doing so would allow for improving the capability of a trained neural network.
Regarding claim 4, Zoph in view of Qi and Wan discloses the neural network architecture search apparatus as claimed in claim 1.  Qi further teaches:
wherein the feature distribution score is calculated based on a center loss indicating an aggregation degree between features of samples of a same class (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness."; Section 2.2, lines 22-26, "Comparing with those in the center loss, the class centers of our proposed contrastive-center loss will be updated to a more discrete distribution for the existence of penalization for too small distances between different class centers.");
and the classification accuracy is calculated based on an inter-class loss indicating a separation degree between features of samples of different classes (Section 1, lines 45-51, "In this paper, we propose the contrastive-center loss, which learns a center for each class. This new loss will simultaneously consider intra-class compactness and inter-class separability by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers."; Table 2, "Classification accuracy (%)").
Qi teaches determining feature distribution based on the center loss and determining classification accuracy based on the inter-class loss in order to enhance the discriminative power of features for training a neural network (Abstract, lines 1-12, "The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers.").
Zoph, Qi, and Wan are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wan to further incorporate the teachings of Qi to determine feature distribution based on the center loss and determine classification accuracy based on the inter-class loss.  Doing so would allow for enhancing the discriminative power of features for training a neural network.
Regarding claim 7, Zoph in view of Qi and Wan discloses the neural network architecture search apparatus as claimed in claim 1.  Zoph further discloses:
wherein the control unit includes a recurrent neural network (Section 3.1, lines 10-12, "Once the controller RNN finishes generating an architecture, a neural network with this architecture is built and trained.").
Regarding claim 8, Zoph discloses a neural network architecture search method, comprising:
a step for defining search space for neural network architecture, of defining a search space used as a set of architecture parameters describing the neural network architecture (Section 4.1, lines 5-7, "Search space: Our search space consists of convolutional architectures, with rectified linear units as non-linearities (Nair & Hinton, 2010), batch normalization (Ioffe & Szegedy, 2015) and skip connections between layers (Section 3.3).");
a control step of performing sampling on the architecture parameters in the search space based on parameters of a control unit, to generate at least one sub-neural network architecture (Section 3.1, lines 1-4, "In Neural Architecture Search, we use a controller to generate architectural hyperparameters of neural networks. To be flexible, the controller is implemented as a recurrent neural network. Let’s suppose we would like to predict feedforward neural networks with only convolutional layers, we can use the controller to generate their hyperparameters as a sequence of tokens"; Figure 2, "How our controller recurrent neural network samples a simple convolutional network. It predicts filter height, filter width, stride height, stride width, and number of filters for one layer and repeats. Every prediction is carried out by a softmax classifier and then fed into the next time step as input."; Section 4.1, lines 17-18, "Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs.");
a training step of, by utilizing all samples in a training set, with respect to each sub-neural network architecture of the at least one sub-neural network architecture (Section 4.2, lines 22-30, "In our experiments, every child model is constructed and trained for 35 epochs. Every child model has two layers, with the number of hidden units adjusted so that total number of learnable parameters approximately match the “medium” baselines (Zaremba et al., 2014; Gal, 2015). In these experiments we only have the controller predict the RNN cell structure and fix all other hyperparameters. The reward function is c/(validation perplexity)2 where c is a constant, usually set at 80. After the controller RNN is done training, we take the best RNN cell according to the lowest validation perplexity and then run a grid search over learning rate, weight initialization, dropout rates and decay epoch. The best cell found was then run with three different configurations and sizes to increase its capacity.");
a reward calculation step of, by utilizing all samples in a validation set, with respect to each sub-neural network architecture having been trained, respectively calculating a classification accuracy (Section 3.2, lines 1-4, "The list of tokens that the controller predicts can be viewed as a list of actions a1:T to design an architecture for a child network. At convergence, this child network will achieve an accuracy R on a held-out dataset. We can use this accuracy R as the reward signal and use reinforcement learning to train the controller."; Section 4, line 1, "We apply our method to an image classification task with CIFAR-10"),
and calculating, based on the classification accuracy and the feature distribution score of each sub-neural network architecture, a reward score of each sub-neural network architecture (Section 4.1, lines 18-19, "The reward used for updating the controller is the maximum validation accuracy of the last 5 epochs cubed."),
and an adjustment step of feeding back the reward score to the control unit, and causing the parameters of the control unit to be adjusted towards a direction in which the reward scores of the at least one sub-neural network architecture are larger, wherein processing in the control step, the training step, the reward calculation step and the adjustment step are performed iteratively, until a predetermined iteration termination condition is satisfied (Section 1, lines 10-16, "Our work is based on the observation that the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string. Training the network specified by the string – the “child network” – on the real data will result in an accuracy on a validation set. Using this accuracy as the reward signal, we can compute the policy gradient to update the controller. As a result, in the next iteration, the controller will give higher probabilities to architectures that receive high accuracies. In other words, the controller will learn to improve its search over time."; Section 4.2, line 63, "Training is stopped at 800K steps.").
Zoph does not specifically disclose: calculating an inter-class loss indicating a separation degree between features of samples of different classes and a center loss indicating an aggregation degree between features of samples of a same class, and performing training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss; calculating a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class.
Qi teaches:
calculating an inter-class loss indicating a separation degree between features of samples of different classes and a center loss indicating an aggregation degree between features of samples of a same class, and performing training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness."; Section 1, lines 45-51, "In this paper, we propose the contrastive-center loss, which learns a center for each class. This new loss will simultaneously consider intra-class compactness and inter-class separability by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers."; Section 2, lines 5-7, "Using softmax loss assisted with our contrastive-center loss to train a deep neural network will really boost the performance of the network.").
Qi teaches calculating inter-class loss and center loss and using a loss function that includes the inter-class loss and center loss to train a neural network in order to enhance the discriminative power of features for training a neural network (Abstract, lines 1-12, "The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers.").
Zoph and Qi are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph to incorporate the teachings of Qi to calculate inter-class loss and center loss and use a loss function that includes the inter-class loss and center loss to train a neural network.  Doing so would allow for enhancing the discriminative power of features for training a neural network.
Zoph in view of Qi does not specifically disclose: calculating a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class.
Wan teaches:
calculating a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class (Abstract, lines 1-2, "We propose a large-margin Gaussian Mixture (L-GM) loss for deep neural networks in classification tasks."; Section 1, lines 70-73, "the GM loss can be readily used to estimate the likelihood of an input to the learned training feature distribution, leading to the possibility of improving the model’s robustness, for example, towards adversarial examples."; Section 1, lines 29-36, "under the softmax loss formulation, the cosine distance based similarity metrics is more appropriate, indicating that using the Euclidean distance based additional losses may not be the most ideal choice. Based on this understanding, an angular distance based margin is introduced in [22] to force extra intra-class compactness and inter-class separability, leading to better generalization of the trained models.").
Wan teaches determining classification accuracy and evaluating feature distribution that indicates intra-class compactness in order to improve the capability of a trained neural network (Section 5, lines 1-10, "We proposed a loss function by assuming a Gaussian Mixture (GM) distribution of the deep features on the training set. Besides the classification loss, a log likelihood regularization term is added to explicitly drive the deep model for generating GM distributed features. To further improve the generalization capability of the trained model, a classification margin is introduced. Extensive experiments demonstrate that the proposed L-GM loss outperforms the softmax loss and its variants in in both small and large-scale datasets when combined with different deep models.").
Zoph, Qi, and Wan are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi to incorporate the teachings of Wan to determine classification accuracy and evaluate feature distribution that indicates intra-class compactness.  Doing so would allow for improving the capability of a trained neural network.
Regarding claim 11, Zoph in view of Qi and Wan discloses the neural network architecture search method as claimed in claim 8.  Qi further teaches:
wherein the feature distribution score is calculated based on a center loss indicating an aggregation degree between features of samples of a same class (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness."; Section 2.2, lines 22-26, "Comparing with those in the center loss, the class centers of our proposed contrastive-center loss will be updated to a more discrete distribution for the existence of penalization for too small distances between different class centers.");
and the classification accuracy is calculated based on an inter-class loss indicating a separation degree between features of samples of different classes (Section 1, lines 45-51, "In this paper, we propose the contrastive-center loss, which learns a center for each class. This new loss will simultaneously consider intra-class compactness and inter-class separability by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers."; Table 2, "Classification accuracy (%)").
Qi teaches determining feature distribution based on the center loss and determining classification accuracy based on the inter-class loss in order to enhance the discriminative power of features for training a neural network (Abstract, lines 1-12, "The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers.").
Zoph, Qi, and Wan are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wan to further incorporate the teachings of Qi to determine feature distribution based on the center loss and determine classification accuracy based on the inter-class loss.  Doing so would allow for enhancing the discriminative power of features for training a neural network.
Regarding claim 14, Zoph discloses a computer readable recording medium having stored thereon a program for causing a computer to perform the following steps:
a step for defining search space for neural network architecture, of defining a search space used as a set of architecture parameters describing the neural network architecture (Section 4.1, lines 5-7, "Search space: Our search space consists of convolutional architectures, with rectified linear units as non-linearities (Nair & Hinton, 2010), batch normalization (Ioffe & Szegedy, 2015) and skip connections between layers (Section 3.3).");
a control step of performing sampling on the architecture parameters in the search space based on parameters of a control unit, to generate at least one sub-neural network architecture (Section 3.1, lines 1-4, "In Neural Architecture Search, we use a controller to generate architectural hyperparameters of neural networks. To be flexible, the controller is implemented as a recurrent neural network. Let’s suppose we would like to predict feedforward neural networks with only convolutional layers, we can use the controller to generate their hyperparameters as a sequence of tokens"; Figure 2, "How our controller recurrent neural network samples a simple convolutional network. It predicts filter height, filter width, stride height, stride width, and number of filters for one layer and repeats. Every prediction is carried out by a softmax classifier and then fed into the next time step as input."; Section 4.1, lines 17-18, "Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs.");
a training step of, by utilizing all samples in a training set, with respect to each sub-neural network architecture of the at least one sub-neural network architecture (Section 4.2, lines 22-30, "In our experiments, every child model is constructed and trained for 35 epochs. Every child model has two layers, with the number of hidden units adjusted so that total number of learnable parameters approximately match the “medium” baselines (Zaremba et al., 2014; Gal, 2015). In these experiments we only have the controller predict the RNN cell structure and fix all other hyperparameters. The reward function is c/(validation perplexity)2 where c is a constant, usually set at 80. After the controller RNN is done training, we take the best RNN cell according to the lowest validation perplexity and then run a grid search over learning rate, weight initialization, dropout rates and decay epoch. The best cell found was then run with three different configurations and sizes to increase its capacity.");
a reward calculation step of, by utilizing all samples in a validation set, with respect to each sub-neural network architecture having been trained, respectively calculating a classification accuracy (Section 3.2, lines 1-4, "The list of tokens that the controller predicts can be viewed as a list of actions a1:T to design an architecture for a child network. At convergence, this child network will achieve an accuracy R on a held-out dataset. We can use this accuracy R as the reward signal and use reinforcement learning to train the controller."; Section 4, line 1, "We apply our method to an image classification task with CIFAR-10"),
and calculating, based on the classification accuracy and the feature distribution score of each sub-neural network architecture, a reward score of each sub-neural network architecture (Section 4.1, lines 18-19, "The reward used for updating the controller is the maximum validation accuracy of the last 5 epochs cubed."),
and an adjustment step of feeding back the reward score to the control unit, and causing the parameters of the control unit to be adjusted towards a direction in which the reward scores of the at least one sub-neural network architecture are larger, wherein processing in the control step, the training step, the reward calculation step and the adjustment step are performed iteratively, until a predetermined iteration termination condition is satisfied (Section 1, lines 10-16, "Our work is based on the observation that the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string. Training the network specified by the string – the “child network” – on the real data will result in an accuracy on a validation set. Using this accuracy as the reward signal, we can compute the policy gradient to update the controller. As a result, in the next iteration, the controller will give higher probabilities to architectures that receive high accuracies. In other words, the controller will learn to improve its search over time."; Section 4.2, line 63, "Training is stopped at 800K steps.").
Zoph does not specifically disclose: calculating an inter-class loss indicating a separation degree between features of samples of different classes and a center loss indicating an aggregation degree between features of samples of a same class, and performing training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss; calculating a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class.
Qi teaches:
calculating an inter-class loss indicating a separation degree between features of samples of different classes and a center loss indicating an aggregation degree between features of samples of a same class, and performing training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness."; Section 1, lines 45-51, "In this paper, we propose the contrastive-center loss, which learns a center for each class. This new loss will simultaneously consider intra-class compactness and inter-class separability by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers."; Section 2, lines 5-7, "Using softmax loss assisted with our contrastive-center loss to train a deep neural network will really boost the performance of the network.").
Qi teaches calculating inter-class loss and center loss and using a loss function that includes the inter-class loss and center loss to train a neural network in order to enhance the discriminative power of features for training a neural network (Abstract, lines 1-12, "The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers.").
Zoph and Qi are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph to incorporate the teachings of Qi to calculate inter-class loss and center loss and use a loss function that includes the inter-class loss and center loss to train a neural network.  Doing so would allow for enhancing the discriminative power of features for training a neural network.
Zoph in view of Qi does not specifically disclose: calculating a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class.
Wan teaches:
calculating a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class (Abstract, lines 1-2, "We propose a large-margin Gaussian Mixture (L-GM) loss for deep neural networks in classification tasks."; Section 1, lines 70-73, "the GM loss can be readily used to estimate the likelihood of an input to the learned training feature distribution, leading to the possibility of improving the model’s robustness, for example, towards adversarial examples."; Section 1, lines 29-36, "under the softmax loss formulation, the cosine distance based similarity metrics is more appropriate, indicating that using the Euclidean distance based additional losses may not be the most ideal choice. Based on this understanding, an angular distance based margin is introduced in [22] to force extra intra-class compactness and inter-class separability, leading to better generalization of the trained models.").
Wan teaches determining classification accuracy and evaluating feature distribution that indicates intra-class compactness in order to improve the capability of a trained neural network (Section 5, lines 1-10, "We proposed a loss function by assuming a Gaussian Mixture (GM) distribution of the deep features on the training set. Besides the classification loss, a log likelihood regularization term is added to explicitly drive the deep model for generating GM distributed features. To further improve the generalization capability of the trained model, a classification margin is introduced. Extensive experiments demonstrate that the proposed L-GM loss outperforms the softmax loss and its variants in in both small and large-scale datasets when combined with different deep models.").
Zoph, Qi, and Wan are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi to incorporate the teachings of Wan to determine classification accuracy and evaluate feature distribution that indicates intra-class compactness.  Doing so would allow for improving the capability of a trained neural network.
Claims 2, 6, 9 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Zoph in view of Qi and Wan, and further in view of Wu et al. (US Patent No. 10,846,552), hereinafter Wu.
Regarding claim 2, Zoph in view of Qi and Wan discloses the neural network architecture search apparatus as claimed in claim 1, but does not specifically disclose:
wherein the unit for defining search space for neural network architecture is configured to define the search space for open-set recognition.
Wu teaches:
wherein the unit for defining search space for neural network architecture is configured to define the search space for open-set recognition (Column 5, lines 7-11, "The detector can use any of a number of different types of object detection algorithms, as may relate to use of a convolutional neural network, including algorithms such as Faster R-CNN, SSD and YOLO, among others as discussed elsewhere herein."; Column 7, lines 18-23, "Approaches in accordance with various embodiments discussed herein can instead provide for open set recognition, which enables the recognition of arbitrary objects, such as logos, products, or various other types of objects, without additional training of the relevant models or obtaining of additional training data."; Column 10, lines 55-61, "Once at least one object region is identified, the object region can be selected 610 for verification. In order to reduce the search space, or otherwise reduce the amount of processing that would otherwise be needed to analyze a large set of images, a subset of similar object images can be determined 612 that are related in some way to the content of the object region to be verified.").
Wu teaches a search space for open-set recognition in order to support the recognition of arbitrary items without additional training (Column 7, lines 61-63, "In contrast, open-set logo recognition methods such as those described herein can support the recognition of arbitrary logos without additional training.").
Zoph, Qi, Wan, and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wan to incorporate the teachings of Wu to use a search space for open-set recognition.  Doing so would allow for supporting the recognition of arbitrary items without additional training.
Regarding claim 6, Zoph in view of Qi and Wan discloses the neural network architecture search apparatus as claimed in claim 1, but does not specifically disclose:
wherein the at least one sub-neural network architecture obtained at the time of iteration termination is used for open-set recognition. 
Wu teaches:
wherein the at least one sub-neural network architecture obtained at the time of iteration termination is used for open-set recognition (Column 7, lines 18-23, "Approaches in accordance with various embodiments discussed herein can instead provide for open set recognition, which enables the recognition of arbitrary objects, such as logos, products, or various other types of objects, without additional training of the relevant models or obtaining of additional training data."; Column 6, lines 18-22, "The classifier in some embodiments can be part of a neural network or machine learning algorithm, where the output once verified can be fed back into the machine learning algorithm for additional training."). 
Wu teaches a neural network used for open-set recognition in order to support the recognition of arbitrary items without additional training (Column 7, lines 61-63, "In contrast, open-set logo recognition methods such as those described herein can support the recognition of arbitrary logos without additional training.").
Zoph, Qi, Wan, and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wan to incorporate the teachings of Wu to use a neural network for open-set recognition.  Doing so would allow for supporting the recognition of arbitrary items without additional training.
Regarding claim 9, Zoph in view of Qi and Wan discloses the neural network architecture search method as claimed in claim 8, but does not specifically disclose:
wherein in the step for defining search space for neural network architecture, the search space is defined for open-set recognition.
Wu teaches:
wherein in the step for defining search space for neural network architecture, the search space is defined for open-set recognition (Column 5, lines 7-11, "The detector can use any of a number of different types of object detection algorithms, as may relate to use of a convolutional neural network, including algorithms such as Faster R-CNN, SSD and YOLO, among others as discussed elsewhere herein."; Column 7, lines 18-23, "Approaches in accordance with various embodiments discussed herein can instead provide for open set recognition, which enables the recognition of arbitrary objects, such as logos, products, or various other types of objects, without additional training of the relevant models or obtaining of additional training data."; Column 10, lines 55-61, "Once at least one object region is identified, the object region can be selected 610 for verification. In order to reduce the search space, or otherwise reduce the amount of processing that would otherwise be needed to analyze a large set of images, a subset of similar object images can be determined 612 that are related in some way to the content of the object region to be verified.").
Wu teaches a search space for open-set recognition in order to support the recognition of arbitrary items without additional training (Column 7, lines 61-63, "In contrast, open-set logo recognition methods such as those described herein can support the recognition of arbitrary logos without additional training.").
Zoph, Qi, Wan, and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wan to incorporate the teachings of Wu to use a search space for open-set recognition.  Doing so would allow for supporting the recognition of arbitrary items without additional training.
Regarding claim 13, Zoph in view of Qi and Wan discloses the neural network architecture search method as claimed in claim 8, but does not specifically disclose:
wherein the at least one sub-neural network architecture obtained at the time of iteration termination is used for open-set recognition.
Wu teaches:
wherein the at least one sub-neural network architecture obtained at the time of iteration termination is used for open-set recognition (Column 7, lines 18-23, "Approaches in accordance with various embodiments discussed herein can instead provide for open set recognition, which enables the recognition of arbitrary objects, such as logos, products, or various other types of objects, without additional training of the relevant models or obtaining of additional training data."; Column 6, lines 18-22, "The classifier in some embodiments can be part of a neural network or machine learning algorithm, where the output once verified can be fed back into the machine learning algorithm for additional training."). 
Wu teaches a neural network used for open-set recognition in order to support the recognition of arbitrary items without additional training (Column 7, lines 61-63, "In contrast, open-set logo recognition methods such as those described herein can support the recognition of arbitrary logos without additional training.").
Zoph, Qi, Wan, and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wan to incorporate the teachings of Wu to use a neural network for open-set recognition.  Doing so would allow for supporting the recognition of arbitrary items without additional training.
Claims 3 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Zoph in view of Qi, Wan, and Wu, and further in view of Tate et al. (US Patent No. 10,762,606), hereinafter Tate.
Regarding claim 3, Zoph in view of Qi, Wan, and Wu discloses the neural network architecture search apparatus as claimed in claim 2.  Zoph further discloses:
wherein the unit for defining search space for neural network architecture is configured to define the neural network architecture as including a predetermined number of block units and the predetermined number of feature integration layers, and is configured to define a structure of each feature integration layer of the predetermined number of feature integration layers in advance ("To limit the search space complexity we have our model predict 13 layers where each layer prediction is a fully connected block of 3 layers."),
and the control unit is configured to perform sampling on the architecture parameters in the search space, to form each block unit of the predetermined number of block units, so as to generate each sub-neural network architecture of the at least one sub-neural network architecture (Section 3.4, lines 17-21, "To make this process more clear, we show an example in Figure 5, for a tree structure that has two leaf nodes and one internal node. The leaf nodes are indexed by 0 and 1, and the internal node is indexed by 2. The controller RNN needs to first predict 3 blocks, each block specifying a combination method and an activation function for each tree index. After that it needs to predict the last 2 blocks that specify how to connect ct and ct-1 to temporary variables inside the tree."; Section 4.1, lines 17-18, "Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs.").
Zoph in view of Qi, Wan, and Wu does not specifically disclose: performing transformation on features of samples, performing integration on the features of the samples which are arranged in series, wherein one of the feature integration layers is arranged downstream of each block unit.
Tate teaches:
performing transformation on features of samples (Column 4, lines 41-45, "The image processing apparatus 10 of the embodiment performs feature transformations 308a to 308c that transforms the low quality images 301a to 301c into features 303a to 303c of the CNN. The feature transformations 308a to 308c each include a two-stage convolution process.").
performing integration on the features of the samples which are arranged in series (Column 4, lines 46-51, "Next, the image processing apparatus 10 is different from the known technique in the respect of including a process of feature integration 309. In the process of the feature integration 309, the features 303a to 303c of the CNN of the low quality images are connected to obtain one feature 304 of the CNN."),
wherein one of the feature integration layers is arranged downstream of each block unit (Column 4, lines 41-51, "The image processing apparatus 10 of the embodiment performs feature transformations 308a to 308c that transforms the low quality images 301a to 301c into features 303a to 303c of the CNN. The feature transformations 308a to 308c each include a two-stage convolution process. Next, the image processing apparatus 10 is different from the known technique in the respect of including a process of feature integration 309. In the process of the feature integration 309, the features 303a to 303c of the CNN of the low quality images are connected to obtain one feature 304 of the CNN.").
Tate teaches performing feature transformations followed by performing feature integration in order to generate a high-quality image from low-quality images from different viewpoint positions (Column 17, lines 63-66, "As described above, in the embodiment, the mode has been described in which low quality images from different viewpoint positions are integrated to generate a high quality image.")
Zoph, Qi, Wan, Wu, and Tate are considered to be analogous to the claimed invention because they are in the same field of neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi, Wan, and Wu to incorporate the teachings of Tate to perform feature transformations followed by feature integration.  Doing so would allow for generating a high-quality image from low-quality images from different viewpoint positions.
Regarding claim 10, Zoph in view of Qi, Wan, and Wu discloses the neural network architecture search method as claimed in claim 9.  Zoph further discloses:
in the step for defining search space for neural network architecture, a structure of each feature integration layer of the predetermined number of feature integration layers is defined in advance ("To limit the search space complexity we have our model predict 13 layers where each layer prediction is a fully connected block of 3 layers."),
the control step, sampling is performed on the architecture parameters in the search space based on parameters of the control unit, to form each block unit of the predetermined number of block units, so as to generate each sub-neural network architecture of the at least one sub-neural network architecture (Section 3.4, lines 17-21, "To make this process more clear, we show an example in Figure 5, for a tree structure that has two leaf nodes and one internal node. The leaf nodes are indexed by 0 and 1, and the internal node is indexed by 2. The controller RNN needs to first predict 3 blocks, each block specifying a combination method and an activation function for each tree index. After that it needs to predict the last 2 blocks that specify how to connect ct and ct-1 to temporary variables inside the tree."; Section 4.1, lines 17-18, "Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs.").
Zoph in view of Qi, Wan, and Wu does not specifically disclose: performing transformation on features of samples, performing integration on the features of the samples which are arranged in series, wherein one of the feature integration layers is arranged downstream of each block unit.
Tate teaches:
performing transformation on features of samples (Column 4, lines 41-45, "The image processing apparatus 10 of the embodiment performs feature transformations 308a to 308c that transforms the low quality images 301a to 301c into features 303a to 303c of the CNN. The feature transformations 308a to 308c each include a two-stage convolution process.").
performing integration on the features of the samples which are arranged in series (Column 4, lines 46-51, "Next, the image processing apparatus 10 is different from the known technique in the respect of including a process of feature integration 309. In the process of the feature integration 309, the features 303a to 303c of the CNN of the low quality images are connected to obtain one feature 304 of the CNN."),
wherein one of the feature integration layers is arranged downstream of each block unit (Column 4, lines 41-51, "The image processing apparatus 10 of the embodiment performs feature transformations 308a to 308c that transforms the low quality images 301a to 301c into features 303a to 303c of the CNN. The feature transformations 308a to 308c each include a two-stage convolution process. Next, the image processing apparatus 10 is different from the known technique in the respect of including a process of feature integration 309. In the process of the feature integration 309, the features 303a to 303c of the CNN of the low quality images are connected to obtain one feature 304 of the CNN.").
Tate teaches performing feature transformations followed by performing feature integration in order to generate a high-quality image from low-quality images from different viewpoint positions (Column 17, lines 63-66, "As described above, in the embodiment, the mode has been described in which low quality images from different viewpoint positions are integrated to generate a high quality image.")
Zoph, Qi, Wan, Wu, and Tate are considered to be analogous to the claimed invention because they are in the same field of neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi, Wan, and Wu to incorporate the teachings of Tate to perform feature transformations followed by feature integration.  Doing so would allow for generating a high-quality image from low-quality images from different viewpoint positions.
Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Zoph in view of Qi and Wan, and further in view of Liu et al. ("Progressive Neural Architecture Search"), hereinafter Liu.
Regarding claim 5, Zoph in view of Qi and Wan discloses the neural network architecture search apparatus as claimed in claim 1, but does not specifically disclose:
wherein the set of architecture parameters comprises any combination of 3x3 convolutional kernel, 5x5 convolutional kernel, 3x3 depthwise separate convolution, 5x5 depthwise separate convolution, 3x3 Max pool, 3x3 Avg pool, Identity residual skip, Identity residual no skip.
Liu teaches:
wherein the set of architecture parameters comprises any combination of 3x3 convolutional kernel, 5x5 convolutional kernel, 3x3 depthwise separate convolution, 5x5 depthwise separate convolution, 3x3 Max pool, 3x3 Avg pool, Identity residual skip, Identity residual no skip (Section 3.1, lines 15-20, "The operator space O is the following set of 8 functions, each of which operates on a single tensor: • 3x3 depthwise-separable convolution  • 5x5 depthwise-separable convolution  • 7x7 depthwise-separable convolution  • 1x7 followed by 7x1 convolution  • identity  • 3x3 average pooling  • 3x3 max pooling  • 3x3 dilated convolution").
Liu teaches a set of architecture parameters including 3x3 depthwise-separable convolution, 5x5 depthwise-separable convolution, 7x7 depthwise-separable convolution, 1x7 followed by 7x1 convolution, identity, 3x3 average pooling, 3x3 max pooling, and 3x3 dilated convolution in order to efficiently use machine learning to determine the structure of convolutional neural networks (Abstract, lines 1-4, "We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms.").
Zoph, Qi, Wan, and Liu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wan to incorporate the teachings of Liu to use a set of architecture parameters including 3x3 depthwise-separable convolution, 5x5 depthwise-separable convolution, 7x7 depthwise-separable convolution, 1x7 followed by 7x1 convolution, identity, 3x3 average pooling, 3x3 max pooling, and 3x3 dilated convolution.  Doing so would allow for efficiently using machine learning to determine the structure of convolutional neural networks.
Regarding claim 12, Zoph in view of Qi and Wan discloses the neural network architecture search method as claimed in claim 8, but does not specifically disclose:
wherein the set of architecture parameters comprises any combination of 3x3 convolutional kernel, 5x5 convolutional kernel, 3x3 depthwise separate convolution, 5x5 depthwise separate convolution, 3x3 Max pool, 3x3 Avg pool, Identity residual skip, Identity residual no skip.
Liu teaches:
wherein the set of architecture parameters comprises any combination of 3x3 convolutional kernel, 5x5 convolutional kernel, 3x3 depthwise separate convolution, 5x5 depthwise separate convolution, 3x3 Max pool, 3x3 Avg pool, Identity residual skip, Identity residual no skip (Section 3.1, lines 15-20, "The operator space O is the following set of 8 functions, each of which operates on a single tensor: • 3x3 depthwise-separable convolution  • 5x5 depthwise-separable convolution  • 7x7 depthwise-separable convolution  • 1x7 followed by 7x1 convolution  • identity  • 3x3 average pooling  • 3x3 max pooling  • 3x3 dilated convolution").
Liu teaches a set of architecture parameters including 3x3 depthwise-separable convolution, 5x5 depthwise-separable convolution, 7x7 depthwise-separable convolution, 1x7 followed by 7x1 convolution, identity, 3x3 average pooling, 3x3 max pooling, and 3x3 dilated convolution in order to efficiently use machine learning to determine the structure of convolutional neural networks (Abstract, lines 1-4, "We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms.").
Zoph, Qi, Wan, and Liu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wan to incorporate the teachings of Liu to use a set of architecture parameters including 3x3 depthwise-separable convolution, 5x5 depthwise-separable convolution, 7x7 depthwise-separable convolution, 1x7 followed by 7x1 convolution, identity, 3x3 average pooling, 3x3 max pooling, and 3x3 dilated convolution.  Doing so would allow for efficiently using machine learning to determine the structure of convolutional neural networks.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Singh et al. (US Patent No. 11,334,791) teaches a system and method for using a recurrent neural network to generate a deep network architecture.
Nachum et al. (US Patent No. 11,315,019) teaches a system and method for training a neural network by learning the structure and the parameters of the neural network.
Merity et al. (US Patent Application Publication No. 2018/0336453) teaches a system and method for automatically generating recurrent neural network architectures.
Vasudevan et al. (US Patent Application Publication No. 2019/0026639) teaches a system and method for using a controller neural network to determine a network architecture for a convolutional neural network.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657