DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Preliminary Amendment
The Preliminary Amendment filed April 29, 2022 has been entered. 
Claims 16-20 have been canceled. 
Claims 1-15 are pending in this application. 

Drawings
The drawings were received on April 29, 2022 and June 28, 2019.  These drawings are acceptable.

Specification
The abstract of the disclosure is objected to because the abstract exceeds 150 words.  The abstract should be in narrative form and generally limited to a single paragraph within the range of 50 to 150 words.  See MPEP § 608.01(b)(C).  Correction is required.  See id. at 608.01(b).

Claim Objections
Claim 5 is objected to because of the following informalities:  
Claim 5, page 20, line 8, “Lis” should read as “L is”.
Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1 and 5 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Panchapagesan et al. (U.S. Patent No. 10,147,442 B1), hereinafter referred to as Panchapagesan.

Regarding claim 1, Panchapagesan discloses: In a deep neural network including a plurality of L layers, wherein L is an integer, each layer having a plurality of nodes 
(2:13–19: “The present disclosure is directed to a neural network acoustic model trained to be robust and produce accurate output when used to process speech signals having acoustic interference. Generally described, neural networks, including but not limited to deep neural networks (“DNNs”), have multiple layers of nodes, and nodes of adjacent layers may be connected to each other.”
The Examiner finds the deep neural networks (“DNNs”), have multiple layers of nodes, and nodes of adjacent layers that are connected to each other as disclosed in Panchapagesan teaches the claimed “deep neural network including a plurality of L layers, wherein L is an integer, each layer having a plurality of nodes”.), a method comprising:
for each L layer in the plurality of layers, randomly connecting nodes of each L layer to nodes in a L+1 layer;
for each L+1 layer in the plurality of layers, connecting nodes of each L+1 layer to nodes in a subsequent L layer in a one-to-one manner (i.e., first subset of hidden layers 104 may be provided to the side task hidden layers 106);
fixing parameters related to the nodes of each L layer; and
updating parameters related to the nodes of each L+1 layers, wherein L is an integer starting with 1
(2:21–23: “Each connection between the various nodes of adjacent layers may be associated with a respective weight.”
6:3–8: “After the first subset of hidden layers 104 has completed processing, output from the last hidden layer of the first subset of hidden layers 104 may be provided to the side task hidden layers 106 (or, in some embodiments, directly to one or both of output layers 108 and 110), which may then apply weights, biases, and activation functions as described above.”
10:40–11:45: “At block 406, the computing system 600 can determine certain parameters, known as “hyper parameters,” for use during the training process. The source-style-separation training process seeks to optimize all three outputs of the neural network 100, rather than only the main acoustic model output. To do so, different loss functions may be used to determine the error for each of the three different outputs—the main acoustic model output and the two side task outputs. Model parameters (e.g., the weights and biases associated with connections between nodes and individual nodes) can then be updated to optimize model output such that a weighted composite of all three loss functions is minimized (e.g., the total error of model output is minimized). In this example, the hyper parameters are the weighting factors for each of the individual loss functions in the weighted composite loss function. The weighting factors are typically different for each of the loss functions due to the different types of loss functions used. For example, a cross entropy loss function may be used to determine the error in the main acoustic model output, because such functions perform well for determining the error in discrete outputs such as the discrete set of probabilities in the main acoustic model output. As another example, an L2 loss function may be used to determine the error in one or both of side task outputs, because such functions perform well for determining the error in continuous outputs such as the reference signal predictions. The process 500 shown in FIG. 5 and described in detail below is one example of a process for optimizing a weighted composite of loss functions during neural network training.”
“In some embodiments, the hyper parameter values may be determined by experimenting with several different sets of values, and selecting the best performing set. The computing system 600 can generate (or a technician can provide the computing system 600 with) a range of values that the weights can take. For example, the weight for the main acoustic model output may be fixed at 1.0 due to its importance in the overall training process, and the weights for the side tasks may each, jointly or independently, be assigned a value within the range of 0.01 to 0.1. The computing system 600 can pick n values (where n is some integer) within that range for each of the side tasks, resulting in n2 combinations of possible hyper parameter values (assuming the weight for the main output is always 1). The computing system 600 can then perform a grid search of the possible values by repeating the remaining portions of the process 400 for each of the possible combinations to generate n2 trained neural networks. Each of the n2 neural networks can then be tested on another set of training data, and the best-performing (e.g., most accurate) neural network can be selected for deployment.”
“At block 408, the computing system 600 can input an input vector or subset of training data input vectors into the neural network 100. For example, an input vector may have n elements (wherein n is some integer), and the first hidden layer of the first subset of hidden layers 104 of the neural network 100 may have n nodes. Each element of the input vector may correspond to a node of the first hidden layer of the first subset of hidden layers 104. The first hidden layer of the first subset of hidden layers 104 may apply weights, biases, and activation functions to the input vector and pass the resulting values to the next hidden layer of the neural network 100 as described in greater detail above.”
The Examiner notes the parameters are weights and biases associated with connections between nodes and individual nodes.
The Examiner finds the weight for the main acoustic model output (i.e., output from the last hidden layer of the first subset of hidden layers) being fixed at 1.0 due to its importance in the overall training process, and the weights for the side tasks each, jointly or independently, being assigned a value within the range of 0.01 to 0.1 as disclosed in Panchapagesan teaches the claimed “for each L layer in the plurality of layers, randomly connecting nodes of each L layer to nodes in a L+1 layer; for each L+1 layer in the plurality of layers, connecting nodes of each L+1 layer to nodes in a subsequent L layer in a one-to-one manner; fixing parameters related to the nodes of each L layer;  and . . . wherein L is an integer starting with 1.” 
The Applicant’s specification describes the claimed “randomly connecting nodes of each L layer” as “initial weights [that] may be chosen randomly”. See Spec. ¶ 169. Consistent with the description in the Specification of the initial values of the weights being assigned randomly, the Examiner finds the weights for the side tasks being jointly assigned a value within the range of 0.01 to 0.1 as disclosed in Panchapagesan teaches the claimed “for each L layer in the plurality of layers, randomly connecting nodes of each L layer to nodes in a L+1 layer; for each L+1 layer in the plurality of layers, connecting nodes of each L+1 layer to nodes in a subsequent L layer in a one-to-one manner”. 
The Examiner further finds the weight for the main acoustic model output being fixed at 1.0 due to its importance in the overall training process as disclosed in Panchapagesan teaches the claimed “fixing parameters related to the nodes of each L layer; . . . wherein L is an integer starting with 1.”
Lastly, the Examiner finds the model parameters (e.g., the weights and biases associated with connections between nodes and individual nodes) being updated to optimize model output as disclosed in Panchapagesan teaches the claimed “updating parameters related to the nodes of each L+1 layers.”).

Regarding claim 5, Panchapagesan discloses: A deep neural network comprising:
a plurality of L layers, each layer having a plurality of nodes 
(2:13–19: “The present disclosure is directed to a neural network acoustic model trained to be robust and produce accurate output when used to process speech signals having acoustic interference. Generally described, neural networks, including but not limited to deep neural networks (“DNNs”), have multiple layers of nodes, and nodes of adjacent layers may be connected to each other.”
The Examiner finds the deep neural networks (“DNNs”), have multiple layers of nodes, and nodes of adjacent layers that are connected to each other as disclosed in Panchapagesan teaches the claimed “deep neural network comprising: a plurality of L layers, each layer having a plurality of nodes”.), wherein for each L layer in the plurality of layers, the nodes of each L layer are randomly connected to nodes in a L+1 layer, and
for each L+1 layer in the plurality of layers, the nodes of each L+1 layer to are connected to nodes in a subsequent L layer in a one-to-one manner (i.e., first subset of hidden layers 104 may be provided to the side task hidden layers 106),
wherein parameters related to the nodes of each L layer are fixed, and 
parameters related to the nodes of each L+1 layers are updated, and wherein L is an integer
(2:21–23: “Each connection between the various nodes of adjacent layers may be associated with a respective weight.”
6:3–8: “After the first subset of hidden layers 104 has completed processing, output from the last hidden layer of the first subset of hidden layers 104 may be provided to the side task hidden layers 106 (or, in some embodiments, directly to one or both of output layers 108 and 110), which may then apply weights, biases, and activation functions as described above.”
10:40–11:45: “At block 406, the computing system 600 can determine certain parameters, known as “hyper parameters,” for use during the training process. The source-style-separation training process seeks to optimize all three outputs of the neural network 100, rather than only the main acoustic model output. To do so, different loss functions may be used to determine the error for each of the three different outputs—the main acoustic model output and the two side task outputs. Model parameters (e.g., the weights and biases associated with connections between nodes and individual nodes) can then be updated to optimize model output such that a weighted composite of all three loss functions is minimized (e.g., the total error of model output is minimized). In this example, the hyper parameters are the weighting factors for each of the individual loss functions in the weighted composite loss function. The weighting factors are typically different for each of the loss functions due to the different types of loss functions used. For example, a cross entropy loss function may be used to determine the error in the main acoustic model output, because such functions perform well for determining the error in discrete outputs such as the discrete set of probabilities in the main acoustic model output. As another example, an L2 loss function may be used to determine the error in one or both of side task outputs, because such functions perform well for determining the error in continuous outputs such as the reference signal predictions. The process 500 shown in FIG. 5 and described in detail below is one example of a process for optimizing a weighted composite of loss functions during neural network training.”
“In some embodiments, the hyper parameter values may be determined by experimenting with several different sets of values, and selecting the best performing set. The computing system 600 can generate (or a technician can provide the computing system 600 with) a range of values that the weights can take. For example, the weight for the main acoustic model output may be fixed at 1.0 due to its importance in the overall training process, and the weights for the side tasks may each, jointly or independently, be assigned a value within the range of 0.01 to 0.1. The computing system 600 can pick n values (where n is some integer) within that range for each of the side tasks, resulting in n2 combinations of possible hyper parameter values (assuming the weight for the main output is always 1). The computing system 600 can then perform a grid search of the possible values by repeating the remaining portions of the process 400 for each of the possible combinations to generate n2 trained neural networks. Each of the n2 neural networks can then be tested on another set of training data, and the best-performing (e.g., most accurate) neural network can be selected for deployment.”
“At block 408, the computing system 600 can input an input vector or subset of training data input vectors into the neural network 100. For example, an input vector may have n elements (wherein n is some integer), and the first hidden layer of the first subset of hidden layers 104 of the neural network 100 may have n nodes. Each element of the input vector may correspond to a node of the first hidden layer of the first subset of hidden layers 104. The first hidden layer of the first subset of hidden layers 104 may apply weights, biases, and activation functions to the input vector and pass the resulting values to the next hidden layer of the neural network 100 as described in greater detail above.”
The Examiner notes the parameters are weights and biases associated with connections between nodes and individual nodes.
The Examiner finds the weight for the main acoustic model output (i.e., output from the last hidden layer of the first subset of hidden layers) being fixed at 1.0 due to its importance in the overall training process, and the weights for the side tasks each, jointly or independently, being assigned a value within the range of 0.01 to 0.1 as disclosed in Panchapagesan teaches the claimed “wherein for each L layer in the plurality of layers, the nodes of each L layer are randomly connected to nodes in a L+1 layer, and for each L+1 layer in the plurality of layers, the nodes of each L+1 layer to are connected to nodes in a subsequent L layer in a one-to-one manner, wherein parameters related to the nodes of each L layer are fixed. . . wherein L is an integer.” 
The Applicant’s specification describes the claimed “randomly connecting nodes of each L layer” as “initial weights [that] may be chosen randomly”. See Spec. ¶ 169. Consistent with the description in the Specification of the initial values of the weights being assigned randomly, the Examiner finds the weights for the side tasks being jointly assigned a value within the range of 0.01 to 0.1 as disclosed in Panchapagesan teaches the claimed “wherein for each L layer in the plurality of layers, the nodes of each L layer are randomly connected to nodes in a L+1 layer, and for each L+1 layer in the plurality of layers, the nodes of each L+1 layer to are connected to nodes in a subsequent L layer in a one-to-one manner”. 
The Examiner further finds the weight for the main acoustic model output being fixed at 1.0 due to its importance in the overall training process as disclosed in Panchapagesan teaches the claimed “wherein parameters related to the nodes of each L layer are fixed. . . wherein L is an integer.”
Lastly, the Examiner finds the model parameters (e.g., the weights and biases associated with connections between nodes and individual nodes) being updated to optimize model output as disclosed in Panchapagesan teaches the claimed “parameters related to the nodes of each L+1 layers are updated”.).

Allowable Subject Matter
Claims 2-4 and 6-8 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 9-15 are allowed.
The following is a statement of reasons for the indication of allowable subject matter:  

Regarding independent claim 9, Panchapagesan et al. (U.S. Patent No. 10,147,442 B1) discloses: A system for a server comprising:
. . . a deep neural network (DNN) having a plurality of L layers, each L layer includes a plurality of nodes
 (2:13–19: “The present disclosure is directed to a neural network acoustic model trained to be robust and produce accurate output when used to process speech signals having acoustic interference. Generally described, neural networks, including but not limited to deep neural networks (“DNNs”), have multiple layers of nodes, and nodes of adjacent layers may be connected to each other.”
The Examiner finds the deep neural networks (“DNNs”), have multiple layers of nodes, and nodes of adjacent layers that are connected to each other as disclosed in Panchapagesan teaches the claimed “system for a server comprising: . . . a deep neural network (DNN) having a plurality of L layers, each L layer includes a plurality of nodes”.), for each L layer in the plurality of layers, nodes are randomly connected to nodes in a L+1 layer, wherein L an integer, and for each L+1 layer in the plurality of L layers, nodes of each L+1 layer are connected to nodes in a subsequent L layer in a one-to-one manner (i.e., first subset of hidden layers 104 may be provided to the side task hidden layers 106) . . . fix parameters related to the nodes of each L layer and to update parameters related to the nodes of each L+1 layers
(2:21–23: “Each connection between the various nodes of adjacent layers may be associated with a respective weight.”
6:3–8: “After the first subset of hidden layers 104 has completed processing, output from the last hidden layer of the first subset of hidden layers 104 may be provided to the side task hidden layers 106 (or, in some embodiments, directly to one or both of output layers 108 and 110), which may then apply weights, biases, and activation functions as described above.”
10:40–11:45: “At block 406, the computing system 600 can determine certain parameters, known as “hyper parameters,” for use during the training process. The source-style-separation training process seeks to optimize all three outputs of the neural network 100, rather than only the main acoustic model output. To do so, different loss functions may be used to determine the error for each of the three different outputs—the main acoustic model output and the two side task outputs. Model parameters (e.g., the weights and biases associated with connections between nodes and individual nodes) can then be updated to optimize model output such that a weighted composite of all three loss functions is minimized (e.g., the total error of model output is minimized). In this example, the hyper parameters are the weighting factors for each of the individual loss functions in the weighted composite loss function. The weighting factors are typically different for each of the loss functions due to the different types of loss functions used. For example, a cross entropy loss function may be used to determine the error in the main acoustic model output, because such functions perform well for determining the error in discrete outputs such as the discrete set of probabilities in the main acoustic model output. As another example, an L2 loss function may be used to determine the error in one or both of side task outputs, because such functions perform well for determining the error in continuous outputs such as the reference signal predictions. The process 500 shown in FIG. 5 and described in detail below is one example of a process for optimizing a weighted composite of loss functions during neural network training.”
“In some embodiments, the hyper parameter values may be determined by experimenting with several different sets of values, and selecting the best performing set. The computing system 600 can generate (or a technician can provide the computing system 600 with) a range of values that the weights can take. For example, the weight for the main acoustic model output may be fixed at 1.0 due to its importance in the overall training process, and the weights for the side tasks may each, jointly or independently, be assigned a value within the range of 0.01 to 0.1. The computing system 600 can pick n values (where n is some integer) within that range for each of the side tasks, resulting in n2 combinations of possible hyper parameter values (assuming the weight for the main output is always 1). The computing system 600 can then perform a grid search of the possible values by repeating the remaining portions of the process 400 for each of the possible combinations to generate n2 trained neural networks. Each of the n2 neural networks can then be tested on another set of training data, and the best-performing (e.g., most accurate) neural network can be selected for deployment.”
“At block 408, the computing system 600 can input an input vector or subset of training data input vectors into the neural network 100. For example, an input vector may have n elements (wherein n is some integer), and the first hidden layer of the first subset of hidden layers 104 of the neural network 100 may have n nodes. Each element of the input vector may correspond to a node of the first hidden layer of the first subset of hidden layers 104. The first hidden layer of the first subset of hidden layers 104 may apply weights, biases, and activation functions to the input vector and pass the resulting values to the next hidden layer of the neural network 100 as described in greater detail above.”
The Examiner notes the parameters are weights and biases associated with connections between nodes and individual nodes.
The Examiner finds the weight for the main acoustic model output (i.e., output from the last hidden layer of the first subset of hidden layers) being fixed at 1.0 due to its importance in the overall training process, and the weights for the side tasks each, jointly or independently, being assigned a value within the range of 0.01 to 0.1 as disclosed in Panchapagesan teaches the claimed “for each L layer in the plurality of layers, nodes are randomly connected to nodes in a L+1 layer, wherein L an integer, and for each L+1 layer in the plurality of L layers, nodes of each L+1 layer are connected to nodes in a subsequent L layer in a one-to-one manner.” 
The Applicant’s specification describes the claimed “randomly connecting nodes of each L layer” as “initial weights [that] may be chosen randomly”. See Spec. ¶ 169. Consistent with the description in the Specification of the initial values of the weights being assigned randomly, the Examiner finds the weights for the side tasks being jointly assigned a value within the range of 0.01 to 0.1 as disclosed in Panchapagesan teaches the claimed “for each L layer in the plurality of layers, nodes are randomly connected to nodes in a L+1 layer, wherein L an integer, and for each L+1 layer in the plurality of L layers, nodes of each L+1 layer are connected to nodes in a subsequent L layer in a one-to-one manner”. 
The Examiner further finds the weight for the main acoustic model output being fixed at 1.0 due to its importance in the overall training process as disclosed in Panchapagesan teaches the claimed “fix parameters related to the nodes of each L layer.”
Lastly, the Examiner finds the model parameters (e.g., the weights and biases associated with connections between nodes and individual nodes) being updated to optimize model output as disclosed in Panchapagesan teaches the claimed “update parameters related to the nodes of each L+1 layers”.).
Li, Shijie, et al., “Heterogeneous blocked CPU-GPU accelerate scheme for large scale extreme learning machine”, Neurocomputing, Vol. 261, Pages 153-163, Elsevier B.V. (8 February 2017) discloses: A system for a server comprising a processing core including a deep neural network (DNN) having a plurality of L layers (Figure 2, page 158, col. 2, paragraph 2: “The proposed algorithms were evaluated on a platform equipped with Intel E5-2650 at 2.0GHz, and 512 GB of RAM.”),
each L layer includes a plurality of nodes, for each L layer in the plurality of layers, nodes are randomly connected to nodes in a L+ 1 layer (page 154, col. 2, last paragraph - page 155, col. 1, paragraph 2: “ELM theories have proved that hidden layer nodes can be generated randomly according to any probability distribution. However, these works focus only on random weights, while ignoring the attribute of random connections. For natural images and languages, the strong local correlations may make the full connections less appropriate . . . Furthermore, simply sharing the input weights to different hidden nodes directly leads to the convolution operation and can be easily implemented. In this way, a specific case for the general ELM-LRF is shown as in Fig. 2 . In Fig. 2 , the hidden layer is comprised of random convolutional nodes.”) . . . a graphics processor (page 155, col. 1, last paragraph: “[N]ovel blocked GPU-based LU decomposition strategy, we can train an arbitrarily large ELM-LRF model with the acceleration of GPUs using the algorithm.”).
However, the Examiner finds Panchapagesan and Li do not teach or suggest the claimed “system for a server comprising: a processing core including a deep neural network (DNN) having a plurality of L layers, each L layer includes a plurality of nodes, for each L layer in the plurality of layers, nodes are randomly connected to nodes in a L+1 layer, wherein L an integer, and for each L+1 layer in the plurality of L layers, nodes of each L+1 layer are connected to nodes in a subsequent L layer in a one-to-one manner; an I/O controller hub coupled to the processor core to provide network, data storage, and DNN access; and a graphics processor to fix parameters related to the nodes of each L layer and to update parameters related to the nodes of each L+1 layers.” A search of the prior art did not reveal references that taught or suggested these limitations. The Examiner, therefore, finds the limitations of claim 9 as allowable over the prior art.  

Regarding independent claim 12, Bhaskar et al. (U.S. Patent Application Publication No. 2017/0200265 A1) discloses: In a deep neural network (Paragraph [0109]: “Various deep learning architectures such as deep neural networks, convolutional deep neural networks, deep belief networks and recurrent neural networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks.”) including an input layer, output layer, and a plurality of hidden layers 
(Paragraph [0140]: “An autoencoder, autoassociator or Diabolo network is an artificial neural network used for unsupervised learning of efficient codings. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. Recently, the autoencoder concept has become more widely used for learning generative models of data. Architecturally, the simplest form of an autoencoder is a feedforward, non-recurrent neural network very similar to the multilayer perceptron (MLP)—having an input layer, an output layer and one or more hidden layers connecting them—, but with the output layer having the same number of nodes as the input layer, and with the purpose of reconstructing its own inputs (instead of predicting the target value given inputs). Therefore, autoencoders are unsupervised learning models. An autoencoder always consists of two parts, the encoder and the decoder. Various techniques exist, to prevent autoencoders from learning the identity function and to improve their ability to capture important information and learn richer representations. The autoencoder may include any suitable variant of autoencoder such as a Denoising autoencoder, sparse autoencoder, variational autoencoder, and contractive autoencoder.”
Paragraph [0141]: “A GAN included in the embodiments described herein may be configured as described in “Generative Adversarial Nets,” Goodfellow et al., arXiv:1406.2661, Jun. 10, 2014, 9 pages, which is incorporated by reference as if fully set forth herein. Goodfellow et al. describe a new framework for estimating generative models via an adversarial process, in which two models are simultaneously trained: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples. The learning based models of the embodiments described herein may be further configured as described by Goodfellow et al.”
The Examiner finds the non-recurrent neural network having an input layer, an output layer and one or more hidden layers as disclosed in Bhaskar teaches the claimed “deep neural network including an input layer, output layer, and a plurality of hidden layers”.) . . .
[deep] Gaussian process (Paragraph [0136]: “With respect to a Deep Gaussian process, as set forth in “Deep Gaussian Processes,” by :Datnianou et al., Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2013, 9 pages, which is incorporated by reference as if fully set forth herein, deep Gaussian process (GP) models are deep belief networks based on Gaussian process mappings. The data is modeled as the output of a multivariate GP. The inputs to that Gaussian process are then governed by another GP. A single layer model is equivalent to a standard GP or the GP latent variable model (GP-LVM). Inference in the model may be performed by approximate variational marginalization. This results in a strict lower bound on the marginal likelihood of the model which can be used for model selection (number of layers and nodes per layer). Deep belief networks are typically applied to relatively large data sets using stochastic gradient descent for optimization. A fully Bayesian treatment allows for the application of deep models even when data is scarce. Model selection by variational bound shows that a five layer hierarchy is justified even when modelling a digit data set containing only 150 examples. The learning based models of the embodiments described herein may be further configured as described in the above incorporated reference by Damianou et al.”).
 However, the Examiner finds Bhaskar, Panchapagesan and Li do not teach or suggest the claimed “deep neural network including an input layer, output layer, and a plurality of hidden layers, a method comprising: determining inputs for the input layer and labels for the output layer related to a first sample; and estimate similarity between different pairs of inputs and labels between a second sample with the first sample using Gaussian process regression.” A search of the prior art did not reveal references that taught or suggested these limitations. The Examiner, therefore, finds the limitations of claim 12 as allowable over the prior art.  

Regarding independent claim 15, the Examiner finds Bhaskar, Panchapagesan and Li do not teach or suggest the claimed “system for a server comprising: a processing core including a deep convolutional neural network (CNN); an I/O controller hub coupled to the processor core to provide network, data storage, and CNN access; and a graphics processor to calculate tensors for nodes of an input layer, output layer, and a plurality of hidden layers of the CNN and to perform a deep convolutional Gaussian regression process to estimate similarity between different pairs of inputs and labels between a first sample and a second sample using the calculated tensors.” A search of the prior art did not reveal references that taught or suggested these limitations. The Examiner, therefore, finds the limitations of claim 15 as allowable over the prior art.  

	Claims 10-11 and 13-14 are also allowable due to their dependency on an allowable base claim.

Prior Art
	The prior art of record, considered pertinent to the applicant’s disclosure, is listed in the attached PTO-892 form.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYLE VALLECILLO whose telephone number is (571)272-7716. The examiner can normally be reached 8:30 A.M. - 4:30 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALBERT DECADY can be reached on (571)272-3819. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KYLE VALLECILLO/Primary Examiner, Art Unit 2112