DETAILED ACTION
1.	This communication is in response to the amendments filed on October 12, 2022 for Application No. 16/160,933 in which claims 1-20 are presented for examination.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
3.	The amendments filed on October 12, 2022 have been considered. Claims 1, 8, and 12 have been amended. Thus, Claims 1-20 are pending and presented for examination.

4.	Applicant’s arguments with respect to the 35 U.S.C. 112(b) rejection of Claims 1-20 regarding the term “differential equations network (DEN)” filed October 12, 2022 have been fully considered and are persuasive. Therefore, the 35 U.S.C. 112(b) rejection of Claims 1-20 has been withdrawn.
Similarly, Applicant’s amendment to Claim 12 to clarify that the non-transitory memory is part of the computer-readable storage medium has been fully considered. The 112(b) rejection of Claim 12 regarding the non-transitory memory, has also been withdrawn. 

5.	Applicant’s arguments with respect to the 35 U.S.C. 103 rejection of Claims 1-20 filed October 12, 2022 have been fully considered but are not persuasive.
Applicant’s arguments on Pg. 8 of Applicant’s Arguments/Remarks filed October 12, 2022 mention that Claim 1 has been amended to require a compact DEN including only one input layer, one hidden layer, and one output layer. However, as previously presented in Claim 2, Lagaris already teaches a multilayer perceptron with a single input layer, single hidden layer, and single output layer. Further, Examiner has cited a Wikipedia reference on “Multilayer Perceptron” models, which also further support the fact that these models are compact and typically only consist of three layers, including one input layer, one or more hidden layers, and one output layer. Thus, Examiner has updated the claim limitation mapping in the subsequent section below. 
Further, Applicant’s arguments on Pgs. 9-10 of Applicant’s Arguments/Remarks filed October 12, 2022 state:
“Turning now to the rejections, the Office relies on Lagaris and Harmon to teach each and every element of claim 1. Both Lagaris and Harmon relate to a differential equations network with a single activation function being learned within a single neural layer. Lagaris teaches an artificial neural network with hidden units (neurons) of a hidden neural layer learning the same activation function. In Section 4 on page 7 of Lagaris, it is disclosed that each hidden unit of the one hidden layer is taught the same sigmoid activation. There are no examples in Lagaris that deviate from this embodiment. Thus, Lagaris fails to teach that each neuron of the n neurons learns a different one of the n activation functions of the only hidden neural network layer of a DEN. 
The Office relies on Harmon to cure the deficiencies of Lagaris.”
Examiner reiterates that the Harmon reference of record is instead used to teach that each neuron of the n neurons learns a different one of the n activation functions. Applicant states on Pg. 10 that “Furthermore, presenting each neuron with the same ensemble of activation functions, as disclosed in Harmon, is not akin to teaching each neuron a different activation function of n activation functions”. Examiner respectfully disagrees. Claim 1 simply states “selecting an activation function of the n activation functions to teach to a neuron of the n neurons, where each neuron of the n neurons learns a different one of the n activation functions to provide a compact DEN”, and the Harmon reference teaches the activation ensemble, in which multiple activation functions are active at each neuron and mentioned on Pg. 7 that “Thus, we see that even with the same network and sets of activation functions, the model chooses different optimal activation functions based upon each individual dataset”. Thus, using the broadest reasonable interpretation of Claim 1, Harmon teaches wherein each neuron is taught a different activation function of n activation functions, since each neuron is provided with an ensemble of different functions to use and is capable of learning each different activation function within the ensemble. 
	Thus, the 35 U.S.C. 103 rejection of Claims 1-20 is maintained. 

Claim Rejections - 35 USC § 103
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

9.	Claims 1-6, and 8-17 are rejected under 35 U.S.C. 103 as being unpatentable over Lagaris et al. (hereinafter Lagaris) (“Artificial Neural Networks for Solving Ordinary and Partial Differential Equations”), in view of Harmon et al. (hereinafter Harmon) (“Activation Ensembles for Deep Neural Networks”).
Regarding Claim 1, Lagaris teaches a method for performing differential equations network (DEN) computations for a DEN having a plurality of DEN layers (Lagaris, Pg. 2, Section 1 Introduction, “We present a general method for solving both ordinary differential equations (ODEs) and partial differential equations (PDEs), that relies on the function approximation capabilities of feedforward neural networks and results in the construction of a solution written in a differentiable, closed analytic form.”, thus, a feedforward neural network having a plurality of layers is used for solving both ordinary differential equations (ODEs) and partial differential equations (PDEs), comprising only one input layer (Lagaris, Pg. 6, Section 3.1 Solution of single ODEs and Systems of coupled ODEs, “where N(x,p) is the output of a feedforward neural network with one input unit for x and weights p.”, thus a single input layer is also used. Further, it is known in the art that a feedforward Multilayer perceptron model, as used by Lagaris, typically contains three or more layers including a single input layer, a single output layer and one or more hidden layers – please see cited Wikipedia “Multilayer Perceptron” reference for further support), only one hidden neural network layer (Lagaris, Pg. 2, Section 1 Introduction, “Since it is known that a multilayer perceptron with one hidden layer can approximate any function to arbitrary accuracy, it is reasonable to consider this type of network architecture as a candidate model for treating differential equations.”, thus there is a single hidden layer comprising n neurons), and only one output layer (Lagaris, Pg. 7, Section 4 Examples, “In all cases we used a multilayer perceptron having one hidden layer with 10 hidden units and one linear output unit.”, therefore the network contains one hidden layer with n neurons and an input and output layer. Further, it is known in the art that a feedforward Multilayer perceptron model, as used by Lagaris, typically contains three or more layers including a single input layer, a single output layer and one or more hidden layers – please see cited Wikipedia “Multilayer Perceptron” reference for further support), the method comprising: 
obtaining, from n input neurons along n dimensions of the DEN (Lagaris, Pg. 4, Section 2.1 Gradient Computation, “Consider a multilayer perceptron with n input units, one hidden layer with H sigmoid units and a linear output unit”, therefore, the feedforward neural network consists of n input neurons along n dimensions of the network, alongside a hidden neural network layer.), 

Lagaris does not explicitly disclose n activation functions for n neurons of a hidden neural network layer of the DEN
However, Harmon teaches n activation functions for n neurons of a hidden neural network layer of the DEN (Harmon, Pg. 4, Section 3.1 Activation Sets, “To explore the strength of our ensemble method, we create three sets of activations to take advantage of the weakness of individual functions. The first set is a number of activation functions seen in networks today. One of the functions, the exponential linear units, garners favorable results with datasets such as CIFAR-100. Others include the less popular inverse absolute value function and the sigmoid function (which is primarily relegated to recurrent networks in most literature).”, therefore, n activation functions are obtained for the n neurons of the hidden neural network layer. A multitude of activation functions are obtained for n neurons such as sigmoid, hyperbolic, soft ReLU, ReLU, inverse absolute value, and exponential linear function.) 

Lagaris does not explicitly disclose selecting an activation function of the n activation functions to teach to a neuron of the n neurons where each neuron of the n neurons learns a different one of the n activation functions to provide a compact DEN;
However, Harmon teaches selecting an activation function of the n activation functions to teach to a neuron of the n neurons (Harmon, Pg. 1, Section 1 Introduction, “The advantage of our architecture is that rather than choosing activations at specified layers or over an entire network, one can give the network the option to choose the best possible activation function of each neuron at each layer.”, thus, an activation function of the n activation functions is selected to teach to each neuron of the n neurons), where each neuron of the n neurons learns a different one of the n activation functions to provide a compact DEN (Harmon, Pg. 1, Abstract, “We call this technique an “activation ensemble” because it allows the use of multiple activation functions at each layer. This is done by introducing additional variables, α, at each activation layer of a network to allow for multiple activation functions to be active at each neuron.”, therefore each neuron of the n neurons is able to have multiple activation function active and thus, may learn a different one of the n activation functions); and 

Lagaris does not explicitly disclose predicting an outcome based on a combination of learned activation functions.
However, Harmon teaches predicting an outcome based on a combination of learned activation functions (Harmon, Pg. 8, Section 5 Conclusion, “Similar to common ensembling techniques in general machine learning, an activation ensemble is a combination of activation functions at each neuron in a neural network. We describe the implementation for standard feed-forward networks, convolutional neural networks, residual networks, and convolutional autoencoders. We create a convex combination of activation functions yet to be seen in literature with an algorithm to solve the new projection problem associated with our model.”, therefore, a combination of learned activation functions are used to project/predict an outcome).
		
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network capable of performing differential equations computations as disclosed by Lagaris to include the activation ensemble, in which each neuron at each layer is able to have a unique activation function as disclosed by Harmon. One of ordinary skill in the art would have been motivated to make this modification to produce a neural network capable of performing differential equations computations that can use the most optimal activation function per each neuron and achieve superior results by combining the advantages of multiple activation functions (Harmon, Pg. 1, Section 1 Introduction, “The end result is a novel activation function that is a combination of existing functions. Each activation function, from ReLU to hyperbolic tangent contain advantages in learning. We propose to use the best parts of each in a dynamic way decided by variables configuring contributions of each activation function. These variables put weights on each activation function under consideration and are optimized via backpropagation. As data passes through a deep neural network, each layer transforms the data to better interpret and gather features. Therefore, the best possible function at the top of a network may not be optimal in the middle or bottom of a network. The advantage of our architecture is that rather than choosing activations at specified layers or over an entire network, one can give the network the option to choose the best possible activation function of each neuron at each layer.”).

Regarding Claim 2, Lagaris in view of Harmon teaches the method of claim 1, wherein the hidden neural network layer is a single neural layer comprising each of the n neurons (Lagaris, Pg. 2, Section 1 Introduction, “Since it is known that a multilayer perceptron with one hidden layer can approximate any function to arbitrary accuracy, it is reasonable to consider this type of network architecture as a candidate model for treating differential equations.”, thus there is a single hidden layer comprising n neurons), and where there are no other layers in the compact DEN other than an input layer (Lagaris, Pg. 6, Section 3.1 Solution of single ODEs and Systems of coupled ODEs, “where N(x,p) is the output of a feedforward neural network with one input unit for x and weights p.”, thus a single input layer is also used.), the hidden neural network layer, and an output layer (Lagaris, Pg. 7, Section 4 Examples, “In all cases we used a multilayer perceptron having one hidden layer with 10 hidden units and one linear output unit.”, therefore the network contains one hidden layer with n neurons and an input and output layer).

Regarding Claim 3, Lagaris in view of Harmon teaches the method of claim 2, wherein the n neurons are configured to learn activation functions independently of one another (Harmon, Pg. 3, Section 3 Activation Ensembles, “Ensemble Layers were created with the idea of allowing a network to choose its own activation for each neuron and for each layer of the network. Overall, the network takes the output of a previous layer, for example from a convolutional step, applies its various activations, normalizes these activations, and places weights on each activation function. We first go through each step of the process of making such a layer.”, thus, each of the n neurons are configured to learn activation functions independently as the network chooses the most optimal activation function per neuron).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 4, Lagaris in view of Harmon teaches the method of claim 1, wherein the selecting is based on at least an error between an actual value and the outcome (Harmon, Pg. 5, Section 4 Experiments, “We train on our three activation sets as well as the same networks with rectifier units for the original networks since they are the standard in most cases. Our stopping criterion is based upon the validation error for each network except for the residual network, in which the suggested number of epochs is 82. Also, we apply Ada Delta for each optimization step for all new variables as well, α,η, and δ for each minibatch. In Table 1 below, we summarize the test accuracy (reconstruction loss for STL-10) of our datasets and various models. Each number is an average over five runs with different random seeds. Note that the largest improvement is found for the ISOLET dataset.”, therefore the activation set selection is based on training the network and comparing validation error), where the outcome is based on activation functions learned by each of the n neurons (Harmon, Pg. 3, Section 3 Activation Ensembles, “The first naive approach is to simply take the input, use a variety of activation functions, and add these activation functions together. We denote the set of activation functions we use to train a network fj ∈ F. Using this method, a network may reap the benefits of having more than one activation function, which may extract different features from the input.”, therefore the outcome/output is based on the activation function learned by each of the n neurons).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 5, Lagaris in view of Harmon teaches the method of claim 4, wherein the n neurons of the hidden neural network layer learn activation functions to decrease the error (Harmon, Pg. 1, Section 1 Introduction, “As data passes through a deep neural network, each layer transforms the data to better interpret and gather features. Therefore, the best possible function at the top of a network may not be optimal in the middle or bottom of a network. The advantage of our architecture is that rather than choosing activations at specified layers or over an entire network, one can give the network the option to choose the best possible activation function of each neuron at each layer.”, therefore the n neurons of the neural network learn activation functions and the most optimal activation function is chosen per each neuron, so as to decrease error).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 6, Lagaris in view of Harmon teaches the method of claim 1, wherein the n neurons comprise at least a first neuron which learns a first activation function and a second neuron which learns a second activation function, wherein the first activation is different than the second activation function (Harmon, Pg. 5, Section 4.1 Comparing Activation Functions, “We first explore the α parameter values of our activation functions. We primarily concentrate on the first set of activations (Sigmoid, Tanh, ReLU, Soft ReLu, ExpLin, InvAbs). Since ReLU is the most common activation function in literature, we expect it be chosen the most by our networks. As seen Figures 1,3,4, and 5, we find this to be true. However, neurons that are deeper may not choose any particular activation. In fact, at some neurons in the bottom layers, the parameters for choosing a function are nearly equal.”, thus, each neuron may learn a separate activation function or a combination of activation functions that may differ between first and second neurons based on the most optimal activation function per neuron).

The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 8, Lagaris teaches a system (Lagaris, Pg. 3, Section 1 Introduction, “The method is general and can be applied to ODEs, systems of ODEs and to PDEs as well.”, thus the method presented can be applied to a system of ODEs and/or PDEs) for performing differential equations network (DEN) computations for a DEN having a plurality of DEN layers (Lagaris, Pg. 3, Section 1 Introduction, “We present a general method for solving both ordinary differential equations (ODEs) and partial differential equations (PDEs), that relies on the function approximation capabilities of feedforward neural networks and results in the construction of a solution written in a differentiable, closed analytic form.”, thus, a feedforward neural network having a plurality of layers is used for solving both ordinary differential equations (ODEs) and partial differential equations (PDEs)) comprising only one input layer (Lagaris, Pg. 6, Section 3.1 Solution of single ODEs and Systems of coupled ODEs, “where N(x,p) is the output of a feedforward neural network with one input unit for x and weights p.”, thus a single input layer is also used. Further, it is known in the art that a feedforward Multilayer perceptron model, as used by Lagaris, typically contains three or more layers including a single input layer, a single output layer and one or more hidden layers – please see cited Wikipedia “Multilayer Perceptron” reference for further support), only one hidden neural network layer (Lagaris, Pg. 2, Section 1 Introduction, “Since it is known that a multilayer perceptron with one hidden layer can approximate any function to arbitrary accuracy, it is reasonable to consider this type of network architecture as a candidate model for treating differential equations.”, thus there is a single hidden layer comprising n neurons), and only one output layer (Lagaris, Pg. 7, Section 4 Examples, “In all cases we used a multilayer perceptron having one hidden layer with 10 hidden units and one linear output unit.”, therefore the network contains one hidden layer with n neurons and an input and output layer. Further, it is known in the art that a feedforward Multilayer perceptron model, as used by Lagaris, typically contains three or more layers including a single input layer, a single output layer and one or more hidden layers – please see cited Wikipedia “Multilayer Perceptron” reference for further support), the system comprising: 
a computer system comprising one or more controllers with non-transitory memory stored (Lagaris, Pg. 3, Section 1 Introduction, “The required number of model parameters is far less than any other solution technique and therefore, compact solution models are obtained, with very low demand on memory space. The method is general and can be applied to ODEs, systems of ODEs and to PDEs as well. The method can be realized in hardware, using neuroprocessors, and hence offer the opportunity to tackle in real time difficult differential equation problems arising in many engineering applications. The method can also be efficiently implemented on parallel architectures.”, thus a controller may be implemented in parallel architecture and due to the lesser number of required model parameters, compact solutions have a low demand on aforementioned memory) thereon that when executed enable the controller to: 
obtain, from n input neurons along n dimensions of the DEN (Lagaris, Pg. 4, Section 2.1 Gradient Computation, “Consider a multilayer perceptron with n input units, one hidden layer with H sigmoid units and a linear output unit”, therefore, the feedforward neural network consists of n input neurons along n dimensions of the network, alongside a single hidden neural network layer.), 

Lagaris does not explicitly disclose n activation functions for n neurons of a single hidden neural network layer of the DEN
However, Harmon teaches n activation functions for n neurons of a single hidden neural network layer of the DEN (Harmon, Pg. 4, Section 3.1 Activation Sets, “To explore the strength of our ensemble method, we create three sets of activations to take advantage of the weakness of individual functions. The first set is a number of activation functions seen in networks today. One of the functions, the exponential linear units, garners favorable results with datasets such as CIFAR-100. Others include the less popular inverse absolute value function and the sigmoid function (which is primarily relegated to recurrent networks in most literature).”, therefore, n activation functions are obtained for the n neurons of the hidden neural network layer. A multitude of activation functions are obtained for n neurons such as sigmoid, hyperbolic, soft ReLU, ReLU, inverse absolute value, and exponential linear function.); 

Lagaris does not explicitly disclose teach each neuron of n neurons one of n activation functions, where each neuron of n neurons of the single hidden neural network layer learns a different activation function of the n activation functions;
However, Harmon teaches teach each neuron of n neurons one of n activation functions (Harmon, Pg. 3, Section 3 Activation Ensembles, “We denote the set of activation functions we use to train a network fj ∈ F. Using this method, a network may reap the benefits of having more than one activation function, which may extract different features from the input. However, simply adding poses a problem for most functions. Many functions, like the sigmoid and hyperbolic tangent possess different values, but they can be easily scaled to have the same range.”, therefore each neuron of n neurons can be taught one activation function or a combination of different activation functions), where each neuron of n neurons of the single hidden neural network layer learns a different activation function of the n activation functions (Harmon, Pg. 1, Abstract, “We call this technique an “activation ensemble” because it allows the use of multiple activation functions at each layer. This is done by introducing additional variables, α, at each activation layer of a network to allow for multiple activation functions to be active at each neuron.”, therefore each neuron of the n neurons is able to have multiple activation function active and thus, may learn a different one of the n activation functions); and 

Lagaris does not explicitly disclose predict an outcome based on a combination of outputs from the n neurons
However, Harmon teaches predict an outcome based on a combination of outputs from the n neurons (Lagaris, Pg. 4, Section 2.1 Gradient Computation, “The efficient minimization of equation (3) can be considered as a procedure of training the neural network where the error corresponding to each input vector xi is the value G(xi) which has to become zero. Computation of this error value involves not only the network output (as is the case in conventional training) but also the derivatives of the output with respect to any of its inputs. Therefore, in computing the gradient of the error with respect to the network weights, we need to compute not only the gradient of the network but also the gradient of the network derivatives with respect to its inputs”, therefore, the network outputs are considered in combination in order to produce a solution/outcome).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 9, Lagaris in view of Harmon teaches the system of claim 8, wherein the n activation functions are based on solutions to a second order linear differential equation (Lagaris, Pg. 6, Section 3 Solution to single ODEs and Systems of coupled ODEs, “To illustrate the method, we consider the first order ODE: dΨ(x) dx =f(x,Ψ) with x ∈ [0,1] and with the IC Ψ(0) = A. A trial solution is written as: Ψt(x) = A+xN(x,p) (11) where N(x,p) is the output of a feedforward neural network with one input unit for x and weights p. Note that Ψt(x) satisfies the IC by construction. The error quantity to be minimized is given by: E[p] = X { i dΨt(xi) dx −f(xi,Ψt(xi))}2 (12) where the xi’s are points in [0,1]. Since dΨt(x)/dx = N(x,p) + xdN(x,p)/dx, it is straightforward to compute the gradient of the error with respect to the parameters p using equations (5)-(10). The same holds for all subsequent model problems. The same procedure can be applied to the second order ODE: d2Ψ(x) dx2 =f(x,Ψ,dΨ dx)”, thus, a second order differential equation is used for trial solutions and minimizing error quantity, thus training the neural network to produce the desired outcome would solicit variations in activation functions used in order to yield optimal results, as previously disclosed by Harmon in claim 8).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 10, Lagaris in view of Harmon teaches the system of claim 8, wherein the n activation functions are based on approximations of one or more of a Gauss Hypergeometric function and a polylogarithm function (Lagaris, Pg. 19, Section 4.3 Comparison with Finite Elements, “The inner products involved in the finite element formulation are computed using the nine-node Gaussian quadrature. The system of equations is solved for the nodal coefficients of the basis function expansion using the Newton’s method forming the Jacobian of the system explicitly (for both linear and nonlinear differential operators): B∆Ψ(n+1) = −R”, therefore, approximations of the gaussian quadrature are used for solving PDE problems with a finite element method, thus training the neural network to produce the desired outcome would solicit variation in activation functions used in order to yield optimal results, as previously disclosed by Harmon in claim 8).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 11, Lagaris in view of Harmon teaches the system of claim 8, wherein the plurality of DEN layers further comprises an input layer comprising the n input neurons (Lagaris, Pg. 6, Section 3.1 Solution of single ODEs and Systems of coupled ODEs, “where N(x,p) is the output of a feedforward neural network with one input unit for x and weights p.”, thus a single input layer is also used.), and an output layer (Lagaris, Pg. 7, Section 4 Examples, “In all cases we used a multilayer perceptron having one hidden layer with 10 hidden units and one linear output unit.”, therefore the network contains one hidden layer with n neurons and an input and output layer) and where there are no additional layers other than the input layer, the output layer, and the single hidden neural layer (Lagaris, Pg. 2, Section 1 Introduction, “Since it is known that a multilayer perceptron with one hidden layer can approximate any function to arbitrary accuracy, it is reasonable to consider this type of network architecture as a candidate model for treating differential equations.”, thus there is a single hidden layer comprising n neurons), and where the single hidden neural layer comprises more than one neuron (Lagaris, Pg. 7, Section 4 Examples, “In all cases we used a multilayer perceptron having one hidden layer with 10 hidden units […]”, therefore the hidden layer comprises more than one hidden units/neurons) and learns more than one activation function (Harmon, Pg. 1, Abstract, “We call this technique an “activation ensemble” because it allows the use of multiple activation functions at each layer. This is done by introducing additional variables, α, at each activation layer of a network to allow for multiple activation functions to be active at each neuron.”, thus each neuron learns more than one activation function).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 12, Lagaris teaches a computer-readable storage medium storing computer executable instructions on non- transitory memory of the computer-readable storage medium (Lagaris, Pg. 3, Section 1 Introduction, “The required number of model parameters is far less than any other solution technique and therefore, compact solution models are obtained, with very low demand on memory space. The method is general and can be applied to ODEs, systems of ODEs and to PDEs as well. The method can be realized in hardware, using neuroprocessors, and hence offer the opportunity to tackle in real time difficult differential equation problems arising in many engineering applications. The method can also be efficiently implemented on parallel architectures.”, thus due to the lesser number of required model parameters, compact solutions have a low demand on aforementioned memory; this memory may store instructions to execute the method), which, when executed by a computer, will cause the computer to perform a method of performing differential equations network (DEN) computations for a DEN having a plurality of DEN layers (Lagaris, Pg. 3, Section 1 Introduction, “We present a general method for solving both ordinary differential equations (ODEs) and partial differential equations (PDEs), that relies on the function approximation capabilities of feedforward neural networks and results in the construction of a solution written in a differentiable, closed analytic form.”, thus, a feedforward neural network having a plurality of layers is used for solving both ordinary differential equations (ODEs) and partial differential equations (PDEs)) comprising only one input layer (Lagaris, Pg. 6, Section 3.1 Solution of single ODEs and Systems of coupled ODEs, “where N(x,p) is the output of a feedforward neural network with one input unit for x and weights p.”, thus a single input layer is also used. Further, it is known in the art that a feedforward Multilayer perceptron model, as used by Lagaris, typically contains three or more layers including a single input layer, a single output layer and one or more hidden layers – please see cited Wikipedia “Multilayer Perceptron” reference for further support), only one hidden neural network layer (Lagaris, Pg. 2, Section 1 Introduction, “Since it is known that a multilayer perceptron with one hidden layer can approximate any function to arbitrary accuracy, it is reasonable to consider this type of network architecture as a candidate model for treating differential equations.”, thus there is a single hidden layer comprising n neurons), and only one output layer (Lagaris, Pg. 7, Section 4 Examples, “In all cases we used a multilayer perceptron having one hidden layer with 10 hidden units and one linear output unit.”, therefore the network contains one hidden layer with n neurons and an input and output layer. Further, it is known in the art that a feedforward Multilayer perceptron model, as used by Lagaris, typically contains three or more layers including a single input layer, a single output layer and one or more hidden layers – please see cited Wikipedia “Multilayer Perceptron” reference for further support), the method comprising: 
predicting an outcome via a combination of the first and second neurons, the outcome corresponding to a combined output of the first and second neurons relayed to an output DEN layer(Lagaris, Pg. 4, Section 2.1 Gradient Computation, “The efficient minimization of equation (3) can be considered as a procedure of training the neural network where the error corresponding to each input vector xi is the value G(xi) which has to become zero. Computation of this error value involves not only the network output (as is the case in conventional training) but also the derivatives of the output with respect to any of its inputs. Therefore, in computing the gradient of the error with respect to the network weights, we need to compute not only the gradient of the network but also the gradient of the network derivatives with respect to its inputs”, therefore, the network outputs are considered in combination in order to produce a solution/outcome), and where each of the input DEN layer, the single hidden neural network layer, and the output DEN layer constitute a compact DEN (Lagaris, Pg. 3, Section 1 Introduction, “The required number of model parameters is far less than any other solution technique and therefore, compact solution models are obtained, with very low demand on memory space.”, thus, since the network has an input layer, single hidden layer, and an output layer as shown in the rejection of Claims 1 and 8, it is a compact solution model).

Lagaris does not explicitly disclose receiving a plurality of activation functions provided by one or more input neurons of an input DEN layer to a single hidden neural network layer
However, Harmon teaches receiving a plurality of activation functions provided by one or more input neurons of an input DEN layer to a single hidden neural network layer (Harmon, Pg. 3, Section 3 Activation Ensembles, “The first naive approach is to simply take the input, use a variety of activation functions, and add these activation functions together. We denote the set of activation functions we use to train a network fj ∈ F. Using this method, a network may reap the benefits of having more than one activation function, which may extract different features from the input.”, therefore, a plurality of activation functions are provided by one or more input neurons and these activation functions may be relayed to the single hidden neural network layer as disclosed by Lagaris, Pg. 7, Section 4 Examples, “In all cases we used a multilayer perceptron having one hidden layer with 10 hidden units […]”); 

Lagaris does not explicitly disclose learning a first activation function via a first neuron of the hidden neural network layer and a second activation function via a second neuron of the hidden neural network layer
However, Harmon teaches learning a first activation function via a first neuron of the hidden neural network layer and a second activation function via a second neuron of the hidden neural network layer (Harmon, Pg. 2, Section 2 Related Work, “In our work, rather than introducing stochasticity, we introduce several activations at each neuron, from which the network can chose a combination. Thus, it can reap the benefits of the sigmoid and hyperbolic tangent function without being limited to these functions at each layer”, therefore, a first activation function may be learned by the first neuron of the layer, and a second activation function may be learned by the second neuron of the layer. Since several activations are introduced, the network is able to choose different activation functions or a combination, in order to optimize performance); 
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 13, Lagaris in view of Harmon teaches the computer-readable storage medium of claim 12, wherein the first activation function is different than the second activation function (Harmon, Pg. 5, Section 4.1 Comparing Activation Functions, “We first explore the α parameter values of our activation functions. We primarily concentrate on the first set of activations (Sigmoid, Tanh, ReLU, Soft ReLu, ExpLin, InvAbs). Since ReLU is the most common activation function in literature, we expect it be chosen the most by our networks. As seen Figures 1,3,4, and 5, we find this to be true. However, neurons that are deeper may not choose any particular activation. In fact, at some neurons in the bottom layers, the parameters for choosing a function are nearly equal.”, thus, each neuron may learn a separate activation function or a combination of activation functions that may differ between first and second neurons based on the most optimal activation function per neuron).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 14, Lagaris in view of Harmon teaches the computer-readable storage medium of claim 12, wherein the selecting further includes selecting the first activation function and the second activation function based on an error between the combined output of the first and second neurons and an expected output, where the expected output is based on a desired output of the first and second neurons  (Harmon, Pg. 5, Section 4 Experiments, “We train on our three activation sets as well as the same networks with rectifier units for the original networks since they are the standard in most cases. Our stopping criterion is based upon the validation error for each network except for the residual network, in which the suggested number of epochs is 82. Also, we apply Ada Delta for each optimization step for all new variables as well, α,η, and δ for each minibatch. In Table 1 below, we summarize the test accuracy (reconstruction loss for STL-10) of our datasets and various models. Each number is an average over five runs with different random seeds. Note that the largest improvement is found for the ISOLET dataset.”, therefore the activation set selection is based on training the network and comparing validation error between expected output and desired output. The neural network is trained to select the activation function for each neuron that would provide the most optimal outcome).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 15, Lagaris in view of Harmon teaches the computer-readable storage medium of claim 14, further comprising adjusting one or more coefficients and mathematical operators of the first activation function and the second activation functions to decrease the error of the combined output (Harmon, Pg. 3, Section 3 Activation Ensembles, “To solve this issue, we need to normalize the activation functions with respect to one another in order to have relatively equal contribution to learning. One option would be to use mean and standard deviation normalization; however, this would not equalize contribution. Therefore, we scale the functions to [0,1]. While building our method, we additionally performed tests using the range of [-1,1] for each activation function. We found that the performance was either close to that of [0,1] or slightly worse. In addition, allowing negative values causes additional issues when choosing the best activation functions with the αparameter we introduce later. Simply adding activations together and forcing them to have equal contribution does not solve our second goal of finding the best possible activation functions for particular problems, networks, and layers. Therefore, we apply an additional weight value, α, to each activation for each neuron. Therefore, for the output of each neuron i and m being the number of activation functions, we have the following activation function for each neuron” (please reference equation on Pg. 3), therefore, the activation function is adjusting in order to decrease the error of combined output. Also stated within the same paragraph “Many functions, like the sigmoid and hyperbolic tangent possess different values, but they can be easy scaled to have the same range”, again showing that the activation functions can be adjusted to decrease error and find desired output).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 16, Lagaris in view of Harmon teaches the computer-readable storage medium of claim 12, wherein the first and second activation functions are one or more of ReLU and sigmoid functions (Harmon, Pg. 5, Section 4.1 Comparing Activation Functions, “We first explore the α parameter values of our activation functions. We primarily concentrate on the first set of activations (Sigmoid, Tanh, ReLU, Soft ReLu, ExpLin, InvAbs). Since ReLU is the most common activation function in literature, we expect it be chosen the most by our networks.”, therefore, the first and second activation functions may comprise ReLU and sigmoid functions).
The reasons of obviousness have been noted in the rejection of Claim 1 above and
applicable herein.

Regarding Claim 17, Lagaris in view of Harmon teaches the computer-readable storage medium claim 12, wherein the hidden neural network is a single layer (Lagaris, Pg. 4, Section 2.1 Gradient Computation, “Consider a multilayer perceptron with n input units, one hidden layer with H sigmoid units and a linear output unit”, therefore, the feedforward neural network consists of n input neurons along n dimensions of the network, alongside a single hidden neural network layer), and where the hidden neural network is the only layer with neurons configured to learn activation functions (Lagaris, Pg. 7, Section 4 Examples, “In all cases we used a multilayer perceptron having one hidden layer with 10 hidden units and one linear output unit.”, therefore the hidden neural network layer is the only layer with neurons configured to learn activation functions and direct output to the output unit/layer).

10.	Claims 7, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lagaris et al. (hereinafter Lagaris) (“Artificial Neural Networks for Solving Ordinary and Partial Differential Equations”), in view of Harmon et al. (hereinafter Harmon) (“Activation Ensembles for Deep Neural Networks”), and further in view of Corrado et al. (hereinafter Corrado) (US PG-PUB 20170032241).
Regarding Claim 7, Lagaris in view of Harmon teaches the method of claim 6, further comprising decreasing an error of the outcome determined by the first and second neurons via adjusting one or more parameters of the first activation function and the second activation function, wherein adjusting the one or more parameters comprises adjusting one or more coefficients and mathematical operators output (Harmon, Pg. 3, Section 3 Activation Ensembles, “To solve this issue, we need to normalize the activation functions with respect to one another in order to have relatively equal contribution to learning. One option would be to use mean and standard deviation normalization; however, this would not equalize contribution. Therefore, we scale the functions to [0,1]. While building our method, we additionally performed tests using the range of [-1,1] for each activation function. We found that the performance was either close to that of [0,1] or slightly worse. In addition, allowing negative values causes additional issues when choosing the best activation functions with the αparameter we introduce later. Simply adding activations together and forcing them to have equal contribution does not solve our second goal of finding the best possible activation functions for particular problems, networks, and layers. Therefore, we apply an additional weight value, α, to each activation for each neuron. Therefore, for the output of each neuron i and m being the number of activation functions, we have the following activation function for each neuron” (please reference equation on Pg. 3), therefore, the activation function is adjusting in order to decrease the error of combined output. Also stated within the same paragraph “Many functions, like the sigmoid and hyperbolic tangent possess different values, but they can be easy scaled to have the same range”, again showing that the activation functions can be adjusted to decrease error and find desired output)., 

Lagaris in view of Harmon does not teach wherein the outcome is associated with healthcare.
However, Corrado teaches wherein the outcome is associated with healthcare (Corrado, Par. [0009], “A doctor or other healthcare professional can be provided with information characterizing the output of the recurrent neural network or outputs derived from outputs generated by the recurrent neural network, improving the healthcare professional's ability to provide quality healthcare to the professional's patients. For example, the healthcare professional can be provided with useful information about future health events that may become associated with a current patient, e.g., health events that are likely to be the next health event to be associated with the patient or likelihoods that certain conditions will be satisfied by events occurring within a specified time period of the most recent event in the sequence.”, therefore the neural network is capable of producing an outcome associated with healthcare).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the network as disclosed by Lagaris in view of Harmon with the ability to produce an outcome associated with healthcare as disclosed by Corrado. One of ordinary skill in the art would have been motivated to make this modification to produce a neural network that permits for the treatment of difficult real-world problems (Lagaris, Pg. 14, Section 5 Conclusions and Future Research, “Such an implementation on neural hardware is one of our near future objectives, since it will permit the treatment of many difficult real-world problems.”) and hence is capable of analyzing health events to produce a healthcare outcome (Corrado, Par. [0009], “A doctor or other healthcare professional can be provided with information characterizing the output of the recurrent neural network or outputs derived from outputs generated by the recurrent neural network, improving the healthcare professional's ability to provide quality healthcare to the professional's patients. For example, the healthcare professional can be provided with useful information about future health events that may become associated with a current patient, e.g., health events that are likely to be the next health event to be associated with the patient or likelihoods that certain conditions will be satisfied by events occurring within a specified time period of the most recent event in the sequence. Additionally, the healthcare professional can be provided with information that identifies the potential effect of a proposed treatment on the likelihoods of the events occurring, e.g., whether a proposed treatment may reduce or increase the likelihood of an undesirable health-related condition being satisfied for the patient in the future.”).
	
Regarding Claim 18, Lagaris in view of Harmon teaches the computer-readable storage medium of claim 12, the rest of the claim language is taught by Lagaris in view of Harmon further in view of Corrado wherein the plurality of activation functions from the one or more input neurons (Corrado, Par. [0036], “In some implementations, the recurrent neural network layers are long short-term memory (LSTM) layers. Each LSTM layer includes one or more LSTM memory blocks. Each LSTM memory block can include one or more cells that each include an input gate, a forget gate, and an output gate that allow the cell to store previous states for the cell, e.g., for use in generating a current activation or to be provided to other components of the LSTM neural network.”, therefore, the activation may be related to the healthcare parameter input) is related to one or more healthcare parameters including age, weight, medications, geographic location, diet, current health status, and daily habits (Corrado, Par. [0034], “In some implementations, one or more of the layers in the sequence can be configured to receive, at a subset of the time steps, e.g., at the first time step, or at each time step, as part of the layer input for the layer a global input, a per-record input, or both. Global inputs are inputs that are not dependent on the current temporal sequence being processed by the recurrent neural network 110. An example of a global input is data characterizing the current time of year, e.g., the current date. Per-record inputs are inputs that may be different for different temporal sequences. Examples of per-record inputs can include a genetic sequence of the patient associated with the current temporal sequence or other information characterizing the patient, e.g., demographic information for the patient.”, thus, inputs may include genetic sequences or demographic information specific to a patient).
The reasons of obviousness have been noted in the rejection of Claim 7 above and
applicable herein.

Regarding Claim 19, Lagaris in view of Harmon teaches the computer-readable storage medium of claim 18, the rest of the claim language is taught by Lagaris in view of Harmon further in view of Corrado wherein the first and second neurons of the hidden neural layer predict healthcare outcomes for one or more of diabetes, acute respiratory disorder, autoimmune diseases, autocrine diseases, neural diseases, mental health disorder, and cancers (Corrado, Par. [0067-0069], “In some implementations, rather than providing the data identifying the temporal sequences for presentation to the user, the system computes statistics from the subsequent events in the temporal sequences and provides the computed statistics for presentation to the user. For example, the system may determine the portion of the temporal sequences that included a particular health event, e.g., a heart attack or a stroke, subsequent to the time step for which the similar network internal state was generated. The system may then provide data identifying the proportion for presentation the user, e.g., in the form “X % of patients expected to have similar futures as the current patient experienced the particular health event.” In some implementations, rather than storing the internal states in the internal state repository, the system can re-compute the internal states for each other temporal sequence whenever an input temporal sequence is received that is to be compared to the other temporal sequences. FIG. 5 is a flow diagram of an example process 500 for generating health event data for a temporal sequence from future condition scores. For convenience, the process 500 will be described as being performed by a system of one or more computers located in one or more locations. For example, a neural network training system, e.g., the neural network training system 100 of FIG. 1, appropriately programmed, can perform the process 500”, thus, future health conditions can be predicted by the neural network).
The reasons of obviousness have been noted in the rejection of Claim 7 above and
applicable herein.

Regarding Claim 20, Lagaris in view of Harmon teaches the computer-readable storage medium of claim 12, the rest of the claim language is taught by Lagaris in view of Harmon further in view of Corrado wherein the plurality of activation functions from the one or more input neurons are adjusted as the healthcare parameters change (Corrado, Par. [0082-0083], “The system determines the change in the future condition scores caused by adding the additional health event to the input temporal sequence (step 612) and provides data identifying the change for presentation to the user (step 614). That is, the system computes differences between future condition scores for the modified input temporal sequence and the corresponding future condition scores for the initial input temporal sequence and provides data identifying the differences for presentation to the user. Thus, a doctor may be able to view the effect of potential treatments on the likelihood that certain conditions will be satisfied in the future.
In some implementations, the system can perform the process 600 automatically in response to a new event being added to a temporal sequence. If the new event causes the future condition score of a condition to increase by more than a threshold or to exceed a threshold, the system can generate an alert to automatically notify the user of the change”, therefore, as healthcare parameters change, the future condition score and predictions would also change, causing an adjustment in the parameters and activation functions).
The reasons of obviousness have been noted in the rejection of Claim 7 above and
applicable herein.

Conclusion
11.	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
Meade et al. (“The Numerical Solution of Linear Ordinary Differential Equations by Feedforward Neural Networks”) disclosed a feedforward neural network, with a single input and output neuron and a single hidden layer, to approximate arbitrary linear ordinary differential equations.

12.	 THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 


13.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Devika S Maharaj whose telephone number is 571-272-0829. The examiner can normally be reached Monday - Thursday 7:30am - 4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/D.S.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123