DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed September 9, 2022 has been entered.  Claims 1, 3 – 6, 8 and 10 – 14 are pending in the application.  Applicant’s amendments to the Specification and Claims have overcome each and every objection, 35 U.S.C. 112(b) rejection, and 35 U.S.C. 101 rejection previously set forth in the Non-Final Office Action mailed June 10, 2022.
Response to Arguments
Applicant's arguments filed September 9, 2022 have been fully considered but they are not persuasive.
On page 15 of Applicant’s response, Applicant argues “On pages 10-11, the Office Action alleges that it would have been obvious to someone of ordinary skill in the art to combine the technical solution of Zoph with that of Qi and Wan. However, the reward score in Zoph is used to adjust, for each trained sub-neural network architecture, while the structure of the trained sub-neural network architecture. The "feature distribution score" in Wan is used to improve the classification accuracy of deep neural networks for classification tasks, but does not relate to the structure of any sub-neural network architecture. Therefore, those skilled in the art would not be motivated to calculate the "reward" in Zoph based on the "feature distribution score" in Wan as to adjust the structure of the trained sub-neural network architecture.”.  Zoph et al. ("Neural Architecture Search with Reinforcement Learning"), hereinafter Zoph, recites, in section 3.2, lines 2-4, “At convergence, this child network will achieve an accuracy R on a held-out dataset. We can use this accuracy R as the reward signal and use reinforcement learning to train the controller.”.  Zoph, also recites, in section 4.1, lines 17-19, “Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs. The reward used for updating the controller is the maximum validation accuracy of the last 5 epochs cubed.”.  Zoph discloses that the reward signal used to train the controller network is determined by the training of the child network, where the maximum validation accuracy reached after training the child network is used as the reward signal for training the controller network.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph to replace the child network of Zoph with a different neural network, trained with a different reward function, and use the final reward value of the child network to train the controller network to perform Neural Architecture Search as described in Zoph.
On page 15 of Applicant’s response, Applicant argues “Finally, the independent claims now recite that "the processor is configured to define the search space for open-set recognition". In open-set recognition, the classification accuracy cannot be calculated for unknown classes since the unknown classes do not have class labels. That is to say, in the open-set recognition, the performance of the open-set recognition cannot be accurately reflected only by the classification accuracy calculated based on the known class. In order to more accurately reflect the performance of open-set recognition, in addition to the classification accuracy, the amended independent claims also calculate the reward score of the sub-neural network architecture based on the feature distribution score. In contrast, all of Zoph, Qi and Wan are directed to close-set recognition, instead of the open-set recognition.”  Qi et al. (“Contrastive-center Loss for Deep Neural Networks"), hereinafter Qi, recites, in the Abstract, lines 1-8, “The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class.”.  Qi discloses training a neural network using a softmax loss, which minimizes classification loss and therefore improves classification accuracy, and using a contrastive-center loss, which is minimized to improve the discriminative power of the features, corresponding to a feature distribution score.  Also, Qi does not specifically disclose that a neural network trained with a combination of softmax loss and contrastive-center loss is limited to closed-set recognition, and by utilizing the combination of softmax loss and contrastive-center loss to reward classification accuracy and feature distribution, Qi discloses a training method applicable to open-set recognition.  Wu et al. (US Patent No. 10,846,552), hereinafter Wu, recites, in column 5, lines 54-62, “In this example, the image data including the representation of the object of interest is provided as input 402. In this example the input is provided to a trained neural network 404, although any appropriate machine learning algorithm or process, or similar approach, can be utilized as well within the scope of the various embodiments. As mentioned, the network can be trained on various types of objects, such as logos, items, products, buildings, geographic locations, and the like.”.  Wu, also recites, in column 7, lines 18-23, “Approaches in accordance with various embodiments discussed herein can instead provide for open set recognition, which enables the recognition of arbitrary objects, such as logos, products, or various other types of objects, without additional training of the relevant models or obtaining of additional training data.”.  Wu disclosed training a neural network for open-set recognition.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph to replace the child network of Zoph with a neural network trained for open-set recognition, as in Wu, trained to reward classification accuracy and feature distribution, as in Qi, and use the final classification accuracy and feature distribution scores of the child network as the reward function to train a controller network to perform Neural Architecture Search as described in Zoph.
Applicant’s remaining arguments with respect to claims 1, 3 – 6, 8 and 10 – 14 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 6, 8, 11 and 13 – 14 are rejected under 35 U.S.C. 103 as being unpatentable over Zoph et al. ("Neural Architecture Search with Reinforcement Learning"), hereinafter Zoph, in view of Qi et al. (“Contrastive-center Loss for Deep Neural Networks"), hereinafter Qi, and Wu et al. (US Patent No. 10,846,552), hereinafter Wu.
Regarding claim 1, Zoph discloses a neural network architecture search apparatus, comprising:
a memory; and a processor coupled to the memory (Section 5, lines 1-6, "In this paper we introduce Neural Architecture Search, an idea of using a recurrent neural network to compose neural network architectures. By using recurrent network as the controller, our method is flexible so that it can search variable-length architecture space. Our method has strong empirical performance on very challenging benchmarks and presents a new research direction for automatically finding good neural network architectures. The code for running the models found by the controller on CIFAR-10 and PTB will be released”; The code for running the models demonstrates that the recurrent neural network controller is implemented in software executing from memory and running on a processor.),
wherein the processor is configured to:
define a search space used as a set of architecture parameters describing the neural network architecture (Section 4.1, lines 5-7, "Search space: Our search space consists of convolutional architectures, with rectified linear units as non-linearities (Nair & Hinton, 2010), batch normalization (Ioffe & Szegedy, 2015) and skip connections between layers (Section 3.3).");
perform sampling on the architecture parameters in the search space based on parameters of a recurrent neural network, to generate at least one sub-neural network architecture (Section 3.1, lines 1-4, "In Neural Architecture Search, we use a controller to generate architectural hyperparameters of neural networks. To be flexible, the controller is implemented as a recurrent neural network. Let’s suppose we would like to predict feedforward neural networks with only convolutional layers, we can use the controller to generate their hyperparameters as a sequence of tokens"; Figure 2, "How our controller recurrent neural network samples a simple convolutional network. It predicts filter height, filter width, stride height, stride width, and number of filters for one layer and repeats. Every prediction is carried out by a softmax classifier and then fed into the next time step as input."; Section 4.1, lines 17-18, "Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs.");
and feed back the reward score to the recurrent neural network, and cause the parameters of the recurrent neural network to be adjusted towards a direction in which the reward scores of the at least one sub-neural network architecture are larger (Section 1, lines 10-16, "Our work is based on the observation that the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string. Training the network specified by the string – the “child network” – on the real data will result in an accuracy on a validation set. Using this accuracy as the reward signal, we can compute the policy gradient to update the controller. As a result, in the next iteration, the controller will give higher probabilities to architectures that receive high accuracies. In other words, the controller will learn to improve its search over time."),
wherein processing of feeding back the reward score and adjusting the parameters are performed iteratively, until a predetermined iteration termination condition is satisfied (Section 1, lines 10-16, "Our work is based on the observation that the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string. Training the network specified by the string – the “child network” – on the real data will result in an accuracy on a validation set. Using this accuracy as the reward signal, we can compute the policy gradient to update the controller. As a result, in the next iteration, the controller will give higher probabilities to architectures that receive high accuracies. In other words, the controller will learn to improve its search over time."; Section 4.2, line 63, "Training is stopped at 800K steps.").
Zoph does not specifically disclose: calculate respectively an inter-class loss indicating a separation degree between features of samples of different classes and a center loss indicating an aggregation degree between features of samples of a same class by utilizing all samples in a training set, with respect to each sub-neural network architecture of the at least one sub-neural network architecture, and perform training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss; calculate respectively a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class by utilizing all samples in a validation set, with respect to each sub-neural network architecture having been trained, and calculate, based on the classification accuracy and the feature distribution score of each sub-neural network architecture, a reward score of each sub-neural network architecture, wherein processing, of performing sampling, calculating the inter-class loss and the center loss and performing training on each sub-neural network architecture, calculating the classification accuracy and the feature distribution score and calculating the reward score, are performed iteratively, until a predetermined iteration termination condition is satisfied.
Qi teaches:
calculate respectively an inter-class loss indicating a separation degree between features of samples of different classes (Section 1, lines 7-11, "For the discriminative features extracted from CNN, the performance is usually much higher than other traditional machine learning algorithms. Usually, the CNN maps images to high dimension space to let the softmax or SVM easy to classify the images to a certain class."; Section 3.1, lines 21-23, "Under the single supervision signal of softmax loss, the features are separable, but with significant intra-class variations."; The softmax loss reads on an inter-class loss.),
 and a center loss indicating an aggregation degree between features of samples of a same class (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness.");
by utilizing all samples in a training set (Section 3, lines 1-8, "To verify the effectiveness of the contrastive-center loss, we evaluate the experiments on two typical visual tasks: visual classification and face recognition. The experiment results demonstrate our contrastive-center loss can not only improve the accuracy on classification, but also boost the performance on visual recognition. In visual classification, we use two wildly used dataset (MNIST [16] and CIFAR10 [17]). In face recognition, the LFW [18] dataset is used."),
with respect to each sub-neural network architecture of the at least one sub-neural network architecture (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions."; The loss calculations for the softmax loss and center loss for a neural network demonstrates performing the loss calculations for each sub-neural network architecture.);
and perform training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions."; Training a neural network to minimize the softmax loss and center loss demonstrates performing the training for each sub-neural network architecture.);
calculate respectively a classification accuracy (Section 1, lines 7-11, "For the discriminative features extracted from CNN, the performance is usually much higher than other traditional machine learning algorithms. Usually, the CNN maps images to high dimension space to let the softmax or SVM easy to classify the images to a certain class."; Section 3.1, lines 21-23, "Under the single supervision signal of softmax loss, the features are separable, but with significant intra-class variations."; The softmax loss reads on the classification accuracy.),
and a feature distribution score indicating a compactness degree between features of samples belonging to a same class by utilizing all samples in a validation set, with respect to each sub-neural network architecture having been trained (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness."; The center loss reads on the feature distribution score.),
and calculate, based on the classification accuracy and the feature distribution score of each sub-neural network architecture, a reward score of each sub-neural network architecture (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions."; The total loss reads on a reward score, where the total loss after training is complete represents a reward score, with a low loss value corresponding to positive performance in classification accuracy and feature distribution.),
wherein processing, of performing sampling, calculating the inter-class loss and the center loss and performing training on each sub-neural network architecture, calculating the classification accuracy and the feature distribution score and calculating the reward score are performed iteratively, until a predetermined iteration termination condition is satisfied (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions”; The process of training a neural network with the joint supervision of softmax loss and center loss to determine a reward score can be repeated for each iteration of the recurrent neural network performing the neural network architecture search.).
Qi teaches calculating inter-class loss and center loss and using a loss function that includes the inter-class loss and center loss to train a neural network in order to enhance the discriminative power of features for training a neural network (Abstract, lines 1-12, "The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers.").
Zoph and Qi are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph to incorporate the teachings of Qi to calculate inter-class loss and center loss and use a loss function that includes the inter-class loss and center loss to train a neural network.  Doing so would allow for enhancing the discriminative power of features for training a neural network.
Zoph in view of Qi does not specifically disclose: wherein the processor is configured to define the search space for open-set recognition.
Wu teaches:
wherein the processor is configured to define the search space for open-set recognition (Column 5, lines 7-11, "The detector can use any of a number of different types of object detection algorithms, as may relate to use of a convolutional neural network, including algorithms such as Faster R-CNN, SSD and YOLO, among others as discussed elsewhere herein."; Column 7, lines 18-23, "Approaches in accordance with various embodiments discussed herein can instead provide for open set recognition, which enables the recognition of arbitrary objects, such as logos, products, or various other types of objects, without additional training of the relevant models or obtaining of additional training data."; Column 10, lines 55-61, "Once at least one object region is identified, the object region can be selected 610 for verification. In order to reduce the search space, or otherwise reduce the amount of processing that would otherwise be needed to analyze a large set of images, a subset of similar object images can be determined 612 that are related in some way to the content of the object region to be verified.").
Wu teaches a search space for open-set recognition in order to support the recognition of arbitrary items without additional training (Column 7, lines 61-63, "In contrast, open-set logo recognition methods such as those described herein can support the recognition of arbitrary logos without additional training.").
Zoph, Qi, and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi to incorporate the teachings of Wu to use a search space for open-set recognition.  Doing so would allow for supporting the recognition of arbitrary items without additional training.
Regarding claim 4, Zoph in view of Qi and Wu discloses the neural network architecture search apparatus as claimed in claim 1.
Qi further teaches:
wherein the feature distribution score is calculated based on the center loss (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness."; The center loss can be used for the feature distribution score, where a low loss value represents positive feature distribution performance.),
and the classification accuracy is calculated based on the inter-class loss (Section 1, lines 7-11, "For the discriminative features extracted from CNN, the performance is usually much higher than other traditional machine learning algorithms. Usually, the CNN maps images to high dimension space to let the softmax or SVM easy to classify the images to a certain class."; Section 3.1, lines 21-23, "Under the single supervision signal of softmax loss, the features are separable, but with significant intra-class variations."; The softmax loss reads on the inter-class loss, and the softmax loss can be used for the classification accuracy, where a low loss value represents positive classification accuracy performance.),
Qi teaches determining feature distribution based on the center loss and determining classification accuracy based on the inter-class loss in order to enhance the discriminative power of features for training a neural network (Abstract, lines 1-12, "The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers.").
Zoph, Qi, and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wu to further incorporate the teachings of Qi to determine feature distribution based on the center loss and determine classification accuracy based on the inter-class loss.  Doing so would allow for enhancing the discriminative power of features for training a neural network.
Regarding claim 6, Zoph in view of Qi and Wu discloses the neural network architecture search apparatus as claimed in claim 1.
Wu further teaches:
wherein the at least one sub-neural network architecture obtained at a time when the predetermined iteration termination condition is satisfied is used for open-set recognition (Column 7, lines 18-23, "Approaches in accordance with various embodiments discussed herein can instead provide for open set recognition, which enables the recognition of arbitrary objects, such as logos, products, or various other types of objects, without additional training of the relevant models or obtaining of additional training data."; Column 6, lines 18-22, "The classifier in some embodiments can be part of a neural network or machine learning algorithm, where the output once verified can be fed back into the machine learning algorithm for additional training.").
Wu teaches a neural network used for open-set recognition in order to support the recognition of arbitrary items without additional training (Column 7, lines 61-63, "In contrast, open-set logo recognition methods such as those described herein can support the recognition of arbitrary logos without additional training.").
Zoph, Qi and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wu to further incorporate the teachings of Wu to implement the recurrent neural network performing the neural network architecture search from Zoph to obtain the neural network for open-set recognition from Wu.  Doing so would allow for supporting the recognition of arbitrary items without additional training.
Regarding claim 8, Zoph discloses a neural network architecture search method, comprising:
defining a search space used as a set of architecture parameters describing the neural network architecture (Section 4.1, lines 5-7, "Search space: Our search space consists of convolutional architectures, with rectified linear units as non-linearities (Nair & Hinton, 2010), batch normalization (Ioffe & Szegedy, 2015) and skip connections between layers (Section 3.3).");
performing sampling on the architecture parameters in the search space based on parameters of a recurrent neural network, to generate at least one sub-neural network architecture (Section 3.1, lines 1-4, "In Neural Architecture Search, we use a controller to generate architectural hyperparameters of neural networks. To be flexible, the controller is implemented as a recurrent neural network. Let’s suppose we would like to predict feedforward neural networks with only convolutional layers, we can use the controller to generate their hyperparameters as a sequence of tokens"; Figure 2, "How our controller recurrent neural network samples a simple convolutional network. It predicts filter height, filter width, stride height, stride width, and number of filters for one layer and repeats. Every prediction is carried out by a softmax classifier and then fed into the next time step as input."; Section 4.1, lines 17-18, "Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs.");
and feeding back the reward score to the recurrent neural network, and causing the parameters of the recurrent neural network to be adjusted towards a direction in which the reward scores of the at least one sub-neural network architecture are larger (Section 1, lines 10-16, "Our work is based on the observation that the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string. Training the network specified by the string – the “child network” – on the real data will result in an accuracy on a validation set. Using this accuracy as the reward signal, we can compute the policy gradient to update the controller. As a result, in the next iteration, the controller will give higher probabilities to architectures that receive high accuracies. In other words, the controller will learn to improve its search over time."),
wherein processing of feeding back the reward score and adjusting the parameters are performed iteratively, until a predetermined iteration termination condition is satisfied (Section 1, lines 10-16, "Our work is based on the observation that the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string. Training the network specified by the string – the “child network” – on the real data will result in an accuracy on a validation set. Using this accuracy as the reward signal, we can compute the policy gradient to update the controller. As a result, in the next iteration, the controller will give higher probabilities to architectures that receive high accuracies. In other words, the controller will learn to improve its search over time."; Section 4.2, line 63, "Training is stopped at 800K steps.").
Zoph does not specifically disclose: calculating an inter-class loss indicating a separation degree between features of samples of different classes and a center loss indicating an aggregation degree between features of samples of a same class by utilizing all samples in a training set, with respect to each sub-neural network architecture of the at least one sub-neural network architecture, and performing training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss; calculating respectively a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class by utilizing all samples in a validation set, with respect to each sub-neural network architecture having been trained, and calculating, based on the classification accuracy and the feature distribution score of each sub-neural network architecture, a reward score of each sub-neural network architecture, wherein processing, of performing sampling, calculating the inter-class loss and the center loss and performing training on each sub-neural network architecture, calculating the classification accuracy and the feature distribution score and calculating the reward score performed iteratively, until a predetermined iteration termination condition is satisfied.
Qi teaches:
calculating an inter-class loss indicating a separation degree between features of samples of different classes (Section 1, lines 7-11, "For the discriminative features extracted from CNN, the performance is usually much higher than other traditional machine learning algorithms. Usually, the CNN maps images to high dimension space to let the softmax or SVM easy to classify the images to a certain class."; Section 3.1, lines 21-23, "Under the single supervision signal of softmax loss, the features are separable, but with significant intra-class variations."; The softmax loss reads on an inter-class loss.),
 and a center loss indicating an aggregation degree between features of samples of a same class (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness.");
by utilizing all samples in a training set (Section 3, lines 1-8, "To verify the effectiveness of the contrastive-center loss, we evaluate the experiments on two typical visual tasks: visual classification and face recognition. The experiment results demonstrate our contrastive-center loss can not only improve the accuracy on classification, but also boost the performance on visual recognition. In visual classification, we use two wildly used dataset (MNIST [16] and CIFAR10 [17]). In face recognition, the LFW [18] dataset is used."),
with respect to each sub-neural network architecture of the at least one sub-neural network architecture (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions."; The loss calculations for the softmax loss and center loss for a neural network demonstrates performing the loss calculations for each sub-neural network architecture.);
and performing training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions."; Training a neural network to minimize the softmax loss and center loss demonstrates performing the training for each sub-neural network architecture.);
calculate respectively a classification accuracy (Section 1, lines 7-11, "For the discriminative features extracted from CNN, the performance is usually much higher than other traditional machine learning algorithms. Usually, the CNN maps images to high dimension space to let the softmax or SVM easy to classify the images to a certain class."; Section 3.1, lines 21-23, "Under the single supervision signal of softmax loss, the features are separable, but with significant intra-class variations."; The softmax loss reads on the classification accuracy.),
and a feature distribution score indicating a compactness degree between features of samples belonging to a same class by utilizing all samples in a validation set, with respect to each sub-neural network architecture having been trained (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness."; The center loss reads on the feature distribution score.),
and calculating, based on the classification accuracy and the feature distribution score of each sub-neural network architecture, a reward score of each sub-neural network architecture (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions."; The total loss reads on a reward score, where the total loss after training is complete represents a reward score, with a low loss value corresponding to positive performance in classification accuracy and feature distribution.),
wherein processing, of performing sampling, calculating the inter-class loss and the center loss and performing training on each sub-neural network architecture, calculating the classification accuracy and the feature distribution score and calculating the reward score are performed iteratively, until a predetermined iteration termination condition is satisfied (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions”; The process of training a neural network with the joint supervision of softmax loss and center loss to determine a reward score can be repeated for each iteration of the recurrent neural network performing the neural network architecture search.).
Qi teaches calculating inter-class loss and center loss and using a loss function that includes the inter-class loss and center loss to train a neural network in order to enhance the discriminative power of features for training a neural network (Abstract, lines 1-12, "The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers.").
Zoph and Qi are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph to incorporate the teachings of Qi to calculate inter-class loss and center loss and use a loss function that includes the inter-class loss and center loss to train a neural network.  Doing so would allow for enhancing the discriminative power of features for training a neural network.
Zoph in view of Qi does not specifically disclose: wherein in the processing of defining search space, the search space is defined for open-set recognition.
Wu teaches:
wherein in the processing of defining search space, the search space is defined for open-set recognition (Column 5, lines 7-11, "The detector can use any of a number of different types of object detection algorithms, as may relate to use of a convolutional neural network, including algorithms such as Faster R-CNN, SSD and YOLO, among others as discussed elsewhere herein."; Column 7, lines 18-23, "Approaches in accordance with various embodiments discussed herein can instead provide for open set recognition, which enables the recognition of arbitrary objects, such as logos, products, or various other types of objects, without additional training of the relevant models or obtaining of additional training data."; Column 10, lines 55-61, "Once at least one object region is identified, the object region can be selected 610 for verification. In order to reduce the search space, or otherwise reduce the amount of processing that would otherwise be needed to analyze a large set of images, a subset of similar object images can be determined 612 that are related in some way to the content of the object region to be verified.").
Wu teaches a search space for open-set recognition in order to support the recognition of arbitrary items without additional training (Column 7, lines 61-63, "In contrast, open-set logo recognition methods such as those described herein can support the recognition of arbitrary logos without additional training.").
Zoph, Qi, and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi to incorporate the teachings of Wu to use a search space for open-set recognition.  Doing so would allow for supporting the recognition of arbitrary items without additional training.
Regarding claim 11, Zoph in view of Qi and Wu discloses the neural network architecture search method as claimed in claim 8.
Qi further teaches:
wherein the feature distribution score is calculated based on the center loss (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness."; The center loss can be used for the feature distribution score, where a low loss value represents positive feature distribution performance.),
and the classification accuracy is calculated based on the inter-class loss (Section 1, lines 7-11, "For the discriminative features extracted from CNN, the performance is usually much higher than other traditional machine learning algorithms. Usually, the CNN maps images to high dimension space to let the softmax or SVM easy to classify the images to a certain class."; Section 3.1, lines 21-23, "Under the single supervision signal of softmax loss, the features are separable, but with significant intra-class variations."; The softmax loss reads on the inter-class loss, and the softmax loss can be used for the classification accuracy, where a low loss value represents positive classification accuracy performance.),
Qi teaches determining feature distribution based on the center loss and determining classification accuracy based on the inter-class loss in order to enhance the discriminative power of features for training a neural network (Abstract, lines 1-12, "The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers.").
Zoph, Qi, and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wu to further incorporate the teachings of Qi to determine feature distribution based on the center loss and determine classification accuracy based on the inter-class loss.  Doing so would allow for enhancing the discriminative power of features for training a neural network.
Regarding claim 13, Zoph in view of Qi and Wu discloses the neural network architecture search method as claimed in claim 8.
Wu further teaches:
wherein the at least one sub-neural network architecture obtained at a time when the predetermined iteration termination condition is satisfied is used for open-set recognition (Column 7, lines 18-23, "Approaches in accordance with various embodiments discussed herein can instead provide for open set recognition, which enables the recognition of arbitrary objects, such as logos, products, or various other types of objects, without additional training of the relevant models or obtaining of additional training data."; Column 6, lines 18-22, "The classifier in some embodiments can be part of a neural network or machine learning algorithm, where the output once verified can be fed back into the machine learning algorithm for additional training.").
Wu teaches a neural network used for open-set recognition in order to support the recognition of arbitrary items without additional training (Column 7, lines 61-63, "In contrast, open-set logo recognition methods such as those described herein can support the recognition of arbitrary logos without additional training.").
Zoph, Qi and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wu to further incorporate the teachings of Wu to implement the recurrent neural network performing the neural network architecture search from Zoph to obtain the neural network for open-set recognition from Wu.  Doing so would allow for supporting the recognition of arbitrary items without additional training.
Regarding claim 14, Zoph discloses a non-transitory computer readable recording medium having stored thereon a program that causes a computer to perform a process (Section 5, lines 1-6, "In this paper we introduce Neural Architecture Search, an idea of using a recurrent neural network to compose neural network architectures. By using recurrent network as the controller, our method is flexible so that it can search variable-length architecture space. Our method has strong empirical performance on very challenging benchmarks and presents a new research direction for automatically finding good neural network architectures. The code for running the models found by the controller on CIFAR-10 and PTB will be released”; The code for running the models demonstrates that the recurrent neural network controller is implemented in software executing from memory and running on a processor.),
the process comprising:
defining a search space used as a set of architecture parameters describing the neural network architecture (Section 4.1, lines 5-7, "Search space: Our search space consists of convolutional architectures, with rectified linear units as non-linearities (Nair & Hinton, 2010), batch normalization (Ioffe & Szegedy, 2015) and skip connections between layers (Section 3.3).");
performing sampling on the architecture parameters in the search space based on parameters of a recurrent neural network, to generate at least one sub-neural network architecture (Section 3.1, lines 1-4, "In Neural Architecture Search, we use a controller to generate architectural hyperparameters of neural networks. To be flexible, the controller is implemented as a recurrent neural network. Let’s suppose we would like to predict feedforward neural networks with only convolutional layers, we can use the controller to generate their hyperparameters as a sequence of tokens"; Figure 2, "How our controller recurrent neural network samples a simple convolutional network. It predicts filter height, filter width, stride height, stride width, and number of filters for one layer and repeats. Every prediction is carried out by a softmax classifier and then fed into the next time step as input."; Section 4.1, lines 17-18, "Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs.");
and feeding back the reward score to the recurrent neural network, and causing the parameters of the recurrent neural network to be adjusted towards a direction in which the reward scores of the at least one sub-neural network architecture are larger (Section 1, lines 10-16, "Our work is based on the observation that the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string. Training the network specified by the string – the “child network” – on the real data will result in an accuracy on a validation set. Using this accuracy as the reward signal, we can compute the policy gradient to update the controller. As a result, in the next iteration, the controller will give higher probabilities to architectures that receive high accuracies. In other words, the controller will learn to improve its search over time."),
wherein processing of feeding back the reward score and adjusting the parameters are performed iteratively, until a predetermined iteration termination condition is satisfied (Section 1, lines 10-16, "Our work is based on the observation that the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string. Training the network specified by the string – the “child network” – on the real data will result in an accuracy on a validation set. Using this accuracy as the reward signal, we can compute the policy gradient to update the controller. As a result, in the next iteration, the controller will give higher probabilities to architectures that receive high accuracies. In other words, the controller will learn to improve its search over time."; Section 4.2, line 63, "Training is stopped at 800K steps.").
Zoph does not specifically disclose: calculating an inter-class loss indicating a separation degree between features of samples of different classes and a center loss indicating an aggregation degree between features of samples of a same class by utilizing all samples in a training set, with respect to each sub-neural network architecture of the at least one sub-neural network architecture, and performing training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss; calculating respectively a classification accuracy and a feature distribution score indicating a compactness degree between features of samples belonging to a same class by utilizing all samples in a validation set, with respect to each sub-neural network architecture having been trained, and calculating, based on the classification accuracy and the feature distribution score of each sub-neural network architecture, a reward score of each sub- neural network architecture, wherein processing, of performing sampling, calculating the inter-class loss and the center loss and performing training on each sub-neural network architecture, calculating the classification accuracy and the feature distribution score and calculating the reward score, are performed iteratively, until a predetermined iteration termination condition is satisfied.
Qi teaches:
calculating an inter-class loss indicating a separation degree between features of samples of different classes (Section 1, lines 7-11, "For the discriminative features extracted from CNN, the performance is usually much higher than other traditional machine learning algorithms. Usually, the CNN maps images to high dimension space to let the softmax or SVM easy to classify the images to a certain class."; Section 3.1, lines 21-23, "Under the single supervision signal of softmax loss, the features are separable, but with significant intra-class variations."; The softmax loss reads on an inter-class loss.),
and a center loss indicating an aggregation degree between features of samples of a same class (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness.");
by utilizing all samples in a training set (Section 3, lines 1-8, "To verify the effectiveness of the contrastive-center loss, we evaluate the experiments on two typical visual tasks: visual classification and face recognition. The experiment results demonstrate our contrastive-center loss can not only improve the accuracy on classification, but also boost the performance on visual recognition. In visual classification, we use two wildly used dataset (MNIST [16] and CIFAR10 [17]). In face recognition, the LFW [18] dataset is used."),
with respect to each sub-neural network architecture of the at least one sub-neural network architecture (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions."; The loss calculations for the softmax loss and center loss for a neural network demonstrates performing the loss calculations for each sub-neural network architecture.);
and performing training on each sub-neural network architecture by minimizing a loss function including the inter-class loss and the center loss (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions."; Training a neural network to minimize the softmax loss and center loss demonstrates performing the training for each sub-neural network architecture.);
calculating respectively a classification accuracy (Section 1, lines 7-11, "For the discriminative features extracted from CNN, the performance is usually much higher than other traditional machine learning algorithms. Usually, the CNN maps images to high dimension space to let the softmax or SVM easy to classify the images to a certain class."; Section 3.1, lines 21-23, "Under the single supervision signal of softmax loss, the features are separable, but with significant intra-class variations."; The softmax loss reads on the classification accuracy.),
and a feature distribution score indicating a compactness degree between features of samples belonging to a same class by utilizing all samples in a validation set, with respect to each sub-neural network architecture having been trained (Section 1, lines 39-43, "The center loss, which learns a center for each class and penalizes the distances between the deep features and their corresponding class centers, is a new novel loss to enforce extra intraclass compactness."; The center loss reads on the feature distribution score.),
and calculating, based on the classification accuracy and the feature distribution score of each sub-neural network architecture, a reward score of each sub-neural network architecture (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions."; The total loss reads on a reward score, where the total loss after training is complete represents a reward score, with a low loss value corresponding to positive performance in classification accuracy and feature distribution.),
wherein processing, of performing sampling, calculating the inter-class loss and the center loss and performing training on each sub-neural network architecture, calculating the classification accuracy and the feature distribution score and calculating the reward score are performed iteratively, until a predetermined iteration termination condition is satisfied (Section 2.1, lines 16-22, "When training the deep neural networks, authors in [13] adopt the joint supervision of softmax loss and center loss to train the networks, as formulated in Eq. (2). L = Ls + λLc (2) Where L denotes the total loss of deep neural network. Ls denotes the softmax loss. Lc denotes the center loss. λ denotes the scalar used for balancing the two loss functions”; The process of training a neural network with the joint supervision of softmax loss and center loss to determine a reward score can be repeated for each iteration of the recurrent neural network performing the neural network architecture search.).
Qi teaches calculating inter-class loss and center loss and using a loss function that includes the inter-class loss and center loss to train a neural network in order to enhance the discriminative power of features for training a neural network (Abstract, lines 1-12, "The deep convolutional neural network (CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastive-center loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non-corresponding class centers.").
Zoph and Qi are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph to incorporate the teachings of Qi to calculate inter-class loss and center loss and use a loss function that includes the inter-class loss and center loss to train a neural network.  Doing so would allow for enhancing the discriminative power of features for training a neural network.
Zoph in view of Qi does not specifically disclose: wherein in the processing of defining search space, the search space is defined for open-set recognition.
Wu teaches:
wherein in the processing of defining search space, the search space is defined for open-set recognition (Column 5, lines 7-11, "The detector can use any of a number of different types of object detection algorithms, as may relate to use of a convolutional neural network, including algorithms such as Faster R-CNN, SSD and YOLO, among others as discussed elsewhere herein."; Column 7, lines 18-23, "Approaches in accordance with various embodiments discussed herein can instead provide for open set recognition, which enables the recognition of arbitrary objects, such as logos, products, or various other types of objects, without additional training of the relevant models or obtaining of additional training data."; Column 10, lines 55-61, "Once at least one object region is identified, the object region can be selected 610 for verification. In order to reduce the search space, or otherwise reduce the amount of processing that would otherwise be needed to analyze a large set of images, a subset of similar object images can be determined 612 that are related in some way to the content of the object region to be verified.").
Wu teaches a search space for open-set recognition in order to support the recognition of arbitrary items without additional training (Column 7, lines 61-63, "In contrast, open-set logo recognition methods such as those described herein can support the recognition of arbitrary logos without additional training.").
Zoph, Qi, and Wu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi to incorporate the teachings of Wu to use a search space for open-set recognition.  Doing so would allow for supporting the recognition of arbitrary items without additional training.
Claims 3 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Zoph in view of Qi and Wu, and further in view of Tate et al. (US Patent No. 10,762,606), hereinafter Tate.
Regarding claim 3, Zoph in view of Qi and Wu discloses the neural network architecture search apparatus as claimed in claim 1.  Zoph further discloses:
define the neural network architecture as including a predetermined number of block units and the predetermined number of feature integration layers, and define a structure of each feature integration layer of the predetermined number of feature integration layers in advance ("To limit the search space complexity we have our model predict 13 layers where each layer prediction is a fully connected block of 3 layers."),
perform sampling on the architecture parameters in the search space, and form each block unit of the predetermined number of block units, so as to generate each sub-neural network architecture of the at least one sub-neural network architecture (Section 3.4, lines 17-21, "To make this process more clear, we show an example in Figure 5, for a tree structure that has two leaf nodes and one internal node. The leaf nodes are indexed by 0 and 1, and the internal node is indexed by 2. The controller RNN needs to first predict 3 blocks, each block specifying a combination method and an activation function for each tree index. After that it needs to predict the last 2 blocks that specify how to connect ct and ct-1 to temporary variables inside the tree."; Section 4.1, lines 17-18, "Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs.").
Zoph in view of Qi and Wu does not specifically disclose: performing transformation on features of samples, performing integration on the features of the samples which are arranged in series, wherein one of the feature integration layers is arranged downstream of each block unit.
Tate teaches:
performing transformation on features of samples (Column 4, lines 41-45, "The image processing apparatus 10 of the embodiment performs feature transformations 308a to 308c that transforms the low quality images 301a to 301c into features 303a to 303c of the CNN. The feature transformations 308a to 308c each include a two-stage convolution process.").
performing integration on the features of the samples which are arranged in series (Column 4, lines 46-51, "Next, the image processing apparatus 10 is different from the known technique in the respect of including a process of feature integration 309. In the process of the feature integration 309, the features 303a to 303c of the CNN of the low quality images are connected to obtain one feature 304 of the CNN."),
wherein one of the feature integration layers is arranged downstream of each block unit (Column 4, lines 41-51, "The image processing apparatus 10 of the embodiment performs feature transformations 308a to 308c that transforms the low quality images 301a to 301c into features 303a to 303c of the CNN. The feature transformations 308a to 308c each include a two-stage convolution process. Next, the image processing apparatus 10 is different from the known technique in the respect of including a process of feature integration 309. In the process of the feature integration 309, the features 303a to 303c of the CNN of the low quality images are connected to obtain one feature 304 of the CNN.").
Tate teaches performing feature transformations followed by performing feature integration in order to generate a high-quality image from low-quality images from different viewpoint positions (Column 17, lines 63-66, "As described above, in the embodiment, the mode has been described in which low quality images from different viewpoint positions are integrated to generate a high quality image.")
Zoph, Qi, Wu, and Tate are considered to be analogous to the claimed invention because they are in the same field of neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wu to incorporate the teachings of Tate to perform feature transformations followed by feature integration.  Doing so would allow for generating a high-quality image from low-quality images from different viewpoint positions.
Regarding claim 10, Zoph in view of Qi and Wu discloses the neural network architecture search method as claimed in claim 8.  Zoph further discloses:
in the processing of defining search space, a structure of each feature integration layer of the predetermined number of feature integration layers is defined in advance ("To limit the search space complexity we have our model predict 13 layers where each layer prediction is a fully connected block of 3 layers."),
and in the processing of performing sampling, sampling is performed on the architecture parameters in the search space based on parameters of the recurrent neural network, to form each block unit of the predetermined number of block units, so as to generate each sub-neural network architecture of the at least one sub-neural network architecture (Section 3.4, lines 17-21, "To make this process more clear, we show an example in Figure 5, for a tree structure that has two leaf nodes and one internal node. The leaf nodes are indexed by 0 and 1, and the internal node is indexed by 2. The controller RNN needs to first predict 3 blocks, each block specifying a combination method and an activation function for each tree index. After that it needs to predict the last 2 blocks that specify how to connect ct and ct-1 to temporary variables inside the tree."; Section 4.1, lines 17-18, "Once the controller RNN samples an architecture, a child model is constructed and trained for 50 epochs.").
Zoph in view of Qi and Wu does not specifically disclose: performing transformation on features of samples, performing integration on the features of the samples which are arranged in series, wherein one of the feature integration layers is arranged downstream of each block unit.
Tate teaches:
performing transformation on features of samples (Column 4, lines 41-45, "The image processing apparatus 10 of the embodiment performs feature transformations 308a to 308c that transforms the low quality images 301a to 301c into features 303a to 303c of the CNN. The feature transformations 308a to 308c each include a two-stage convolution process.").
performing integration on the features of the samples which are arranged in series (Column 4, lines 46-51, "Next, the image processing apparatus 10 is different from the known technique in the respect of including a process of feature integration 309. In the process of the feature integration 309, the features 303a to 303c of the CNN of the low quality images are connected to obtain one feature 304 of the CNN."),
wherein one of the feature integration layers is arranged downstream of each block unit (Column 4, lines 41-51, "The image processing apparatus 10 of the embodiment performs feature transformations 308a to 308c that transforms the low quality images 301a to 301c into features 303a to 303c of the CNN. The feature transformations 308a to 308c each include a two-stage convolution process. Next, the image processing apparatus 10 is different from the known technique in the respect of including a process of feature integration 309. In the process of the feature integration 309, the features 303a to 303c of the CNN of the low quality images are connected to obtain one feature 304 of the CNN.").
Tate teaches performing feature transformations followed by performing feature integration in order to generate a high-quality image from low-quality images from different viewpoint positions (Column 17, lines 63-66, "As described above, in the embodiment, the mode has been described in which low quality images from different viewpoint positions are integrated to generate a high quality image.")
Zoph, Qi, Wu, and Tate are considered to be analogous to the claimed invention because they are in the same field of neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wu to incorporate the teachings of Tate to perform feature transformations followed by feature integration.  Doing so would allow for generating a high-quality image from low-quality images from different viewpoint positions.
Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Zoph in view of Qi and Wu, and further in view of Liu et al. ("Progressive Neural Architecture Search"), hereinafter Liu.
Regarding claim 5, Zoph in view of Qi and Wu discloses the neural network architecture search apparatus as claimed in claim 1, but does not specifically disclose:
wherein the set of architecture parameters comprises any combination of 3x3 convolutional kernel, 5x5 convolutional kernel, 3x3 depthwise separate convolution, 5x5 depthwise separate convolution, 3x3 Max pool, 3x3 Avg pool, Identity residual skip, Identity residual no skip.
Liu teaches:
wherein the set of architecture parameters comprises any combination of 3x3 convolutional kernel, 5x5 convolutional kernel, 3x3 depthwise separate convolution, 5x5 depthwise separate convolution, 3x3 Max pool, 3x3 Avg pool, Identity residual skip, Identity residual no skip (Section 3.1, lines 15-20, "The operator space O is the following set of 8 functions, each of which operates on a single tensor: • 3x3 depthwise-separable convolution  • 5x5 depthwise-separable convolution  • 7x7 depthwise-separable convolution  • 1x7 followed by 7x1 convolution  • identity  • 3x3 average pooling  • 3x3 max pooling  • 3x3 dilated convolution").
Liu teaches a set of architecture parameters including 3x3 depthwise-separable convolution, 5x5 depthwise-separable convolution, 7x7 depthwise-separable convolution, 1x7 followed by 7x1 convolution, identity, 3x3 average pooling, 3x3 max pooling, and 3x3 dilated convolution in order to efficiently use machine learning to determine the structure of convolutional neural networks (Abstract, lines 1-4, "We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms.").
Zoph, Qi, Wu, and Liu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wu to incorporate the teachings of Liu to use a set of architecture parameters including 3x3 depthwise-separable convolution, 5x5 depthwise-separable convolution, 7x7 depthwise-separable convolution, 1x7 followed by 7x1 convolution, identity, 3x3 average pooling, 3x3 max pooling, and 3x3 dilated convolution.  Doing so would allow for efficiently using machine learning to determine the structure of convolutional neural networks.
Regarding claim 12, Zoph in view of Qi and Wu discloses the neural network architecture search method as claimed in claim 8, but does not specifically disclose:
wherein the set of architecture parameters comprises any combination of 3x3 convolutional kernel, 5x5 convolutional kernel, 3x3 depthwise separate convolution, 5x5 depthwise separate convolution, 3x3 Max pool, 3x3 Avg pool, Identity residual skip, Identity residual no skip.
Liu teaches:
wherein the set of architecture parameters comprises any combination of 3x3 convolutional kernel, 5x5 convolutional kernel, 3x3 depthwise separate convolution, 5x5 depthwise separate convolution, 3x3 Max pool, 3x3 Avg pool, Identity residual skip, Identity residual no skip (Section 3.1, lines 15-20, "The operator space O is the following set of 8 functions, each of which operates on a single tensor: • 3x3 depthwise-separable convolution  • 5x5 depthwise-separable convolution  • 7x7 depthwise-separable convolution  • 1x7 followed by 7x1 convolution  • identity  • 3x3 average pooling  • 3x3 max pooling  • 3x3 dilated convolution").
Liu teaches a set of architecture parameters including 3x3 depthwise-separable convolution, 5x5 depthwise-separable convolution, 7x7 depthwise-separable convolution, 1x7 followed by 7x1 convolution, identity, 3x3 average pooling, 3x3 max pooling, and 3x3 dilated convolution in order to efficiently use machine learning to determine the structure of convolutional neural networks (Abstract, lines 1-4, "We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms.").
Zoph, Qi, Wu, and Liu are considered to be analogous to the claimed invention because they are in the same field of training neural networks.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zoph in view of Qi and Wu to incorporate the teachings of Liu to use a set of architecture parameters including 3x3 depthwise-separable convolution, 5x5 depthwise-separable convolution, 7x7 depthwise-separable convolution, 1x7 followed by 7x1 convolution, identity, 3x3 average pooling, 3x3 max pooling, and 3x3 dilated convolution.  Doing so would allow for efficiently using machine learning to determine the structure of convolutional neural networks.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657