DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The Abstract is essentially a copy of the method claim and contains legal language, which is impermissible. Appropriate correction is required. 

The use of the terms JAVA®, JAVASCRIPT®, PYTHON®, PERL®, LUA®, which are a trade name or a mark used in commerce, has been noted in this application. It should be capitalized wherever it appears and be accompanied by the generic terminology. It should also include a ®, TM or SM, whichever is appropriate.


Allowable Subject Matter
Claims 8 and 28 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Specifically, the second limitation on lines 4-5 denoting a computation of a difference between the partial derivatives with a largest activation value and a second largest activation value can be novel. While the cited prior references can teach partial derivatives, they do not explicitly teach a difference of the partial derivatives between a largest activation value and a second largest activation value. Accordingly, the claims contain allowable subject matter. 

Claims 13 and 33 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, as well as rewritten to address the §112(b) rejection.
 
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.





The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 13 and 33 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 13 and 33 recite δ(n) but it is not defined in the claims. The claims also recite “sign” with no further explanation. Accordingly, Applicant is asked to provide these definitions in the claims. Reference can be made to specification [0021] for definition information.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.

3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 1, 3-6, 14-21, 23-26, and 34-40 are rejected under 35 U.S.C. 103 as being unpatentable over Faivishevsky et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2018/0240010, hereinafter Faivishevsky) in view of Bengio et. al. (U.S. Pat. No. 6,128,606, hereinafter Bengio). 

Regarding claim 1, Faivishevsky teaches:
A computer-implemented method for analyzing a first machine learning system via a second machine learning system, the first machine learning system comprising …, the method comprising ([0011] and [0020]-[0021]: describing optimization and accuracy in the operation of the machine learning (ML) ensemble model.): 
connecting, by a computer system ([0011] and [0016]: describing a computing device for operating the ML model and networks.), the first machine learning system to an input of the second machine learning system ([0018], [0021], and [0027]: describing that the first ML network, i.e. the deep neural network (DNN), is connected to the second ML network, i.e. the recurrent neural network (RNN). This is shown in Fig. 3.); 
wherein the second machine learning system ([0021] and [0027]: describing the RNN, which is depicted in Fig. 3.)…;
([0018] and [0020]-[0021]: describing that various parameters are input into the DNN. This is also shown in Fig. 3.);
collecting, by the computer system, internal characteristic data from the first machine learning system associated with the internal characteristic ([0025]-[0026]: describing a representation of the input data using the DNN, wherein such representation comprises an optimal representation of the input data, with such data being in correlation with accuracy values. The input data can comprise various internal parameters, e.g. loss function, update rule, etc. ([0022]). Whereby the representation comprising input data and accuracy can denote internal characteristic data.); ….

While the cited reference teaches the limitations of claim 1, it does not explicitly teach: “a first objective function” in the preamble; “comprises a second objective function for analyzing an internal characteristic of the first machine learning system” on lines 6-7; and “computing, by the computer system, partial derivatives of the first objective function through the first machine learning system with respect to the data item; and computing, by the computer system, partial derivatives of the second objective function through both the second machine learning system and the first machine learning system with respect to the collected internal characteristic data” on lines 11-15. Bengio discloses the claim limitations, teaching: 
“a first objective function”: describing an objective function E that can be calculated for nth number of machine learning modules (Bengio col. 5, lines 19-52).  
“comprises a second objective function for analyzing an internal characteristic of the first machine learning system”: describing an objective function E that can be calculated for machine learning modules in an ensemble of nth machine learning modules (Bengio col. 5, lines 19-52).  Wherein E can be globally computed and optimized over all the modules (Bengio col. 8, lines 22-38). The tunable parameters can be for example, weights in a convolutional neural network (Bengio col. 9, lines 15-20). 
“computing, by the computer system, partial derivatives of the first objective function through the first machine learning system with respect to the data item (Bengio col. 5, lines 19-57: describing an objective function E that can be calculated for machine learning modules in an ensemble of nth machine learning modules, wherein a “partial derivative of E” can be computed with regards to the tunable parameters, inputs, and outputs of the modules.); and
computing, by the computer system, partial derivatives of the second objective function through both the second machine learning system and the first machine learning system with respect to the collected internal characteristic data  (Bengio describing an global objective function E that can be calculated for machine learning modules in an ensemble of nth machine learning modules, wherein a “partial derivative of E” can be computed with regards to the tunable parameters, inputs, and outputs of the modules (Bengio col. 5, lines 19-57).  Wherein E can be globally computed and optimized over all the modules (Bengio col. 8, lines 22-38).)”. 
Bengio. Doing so would enable “a new architecture for trainable systems that significantly extends the domain of applications of multi-layered networks and gradient-based learning” (Bengio col. 16, lines 50-53), wherein the various modules can be optimized and “usable as part of a globally trainable system (Bengio col. 17, lines 3-5). 

Regarding claim 3, the rejection of claim 1 is incorporated. Faivishevsky teaches: 
The computer-implemented method of claim 1, wherein connecting the first machine learning system to the second machine learning system comprises: connecting, by the computer system, an internal element of the first machine learning system to an input of the second machine learning system ([0026]-[0027]: describing that the DNN have a five layers, wherein the latter layer of the DNN is connected to the input of the RNN. This is also shown in Fig. 3. It is understood by PHOSITA that each layer of a NN comprises a plurality of nodes.).

Regarding claim 4, the rejection of claim 3 is incorporated. Faivishevsky teaches: 
The computer-implemented method of claim 3, wherein the first machine learning system comprises a neural network and the internal element comprises a node of the neural network ([0021] and [0026]: describing that the first ML network can be a DNN with a plurality of layers. Wherein it is understood by PHOSITA that each layer of a NN comprises a plurality of nodes.).

Regarding claim 5, the rejection of claim 1 is incorporated. Faivishevsky teaches: 
The computer-implemented method of claim 1, wherein the internal characteristic data comprises a latent variable ([0026]: describing that each layer of the DNN can include a rectified linear unit (ReLu) activation value, wherein the value can comprise a latent variable.).

Regarding claim 6, the rejection of claim 5 is incorporated. Faivishevsky teaches: 
The computer-implemented method of claim 5, wherein: 
the first machine learning system comprises a neural network comprising an inner node ([0021] and [0026]: describing the layers of the DNN that can comprise a plurality of hidden layers. Wherein it is understood by PHOSITA that each layer of a NN comprises a plurality of nodes. Thus, it is understood that the hidden layers comprise a plurality of hidden nodes.); and 
the latent variable comprises an activation value of the inner node ([0026]: describing that the layers of the DNN can include a rectified linear unit (ReLu) activation value for each layer, wherein the layers can comprise hidden layers and correspondingly, hidden nodes.).

Regarding claim 14, the rejection of claim 1 is incorporated. Faivishevsky teaches: 
The computer-implemented method of claim 1, wherein the internal characteristic data comprises a comparison between a first value of a latent variable for a current training iteration and a second value of the latent variable for a prior training iteration of the first machine learning system ([0017], [0025], and [0029]-[0030]: describing that an accuracy value of the machine learning network can be computed “at an associated training iteration” and evaluated to determine if the machine learning network has been optimize and can perform at a predetermined level or if further training is needed. Wherein the accuracy computation can includes parameter configurations in correlation with the layers and composition of the NNs including activations, which can denote latent variables ([0025]-[0026]).).

Regarding claim 15, the rejection of claim 1 is incorporated. Bengio further teaches: 
The computer-implemented method of claim 1, wherein the internal characteristic data comprises a comparison between a first derivative calculated with respect to a parameter of the first machine learning system for a current training iteration and a second derivative calculated with respect to the parameter of the first machine learning system for a prior training iteration of the first machine learning system (Bengio col. 6, lines 1-38: describing an iterative process comprising forward pass and backward propagation computations for the objective function E, wherein the computed values are examined and evaluated to determine if “validation” is reached, i.e. a stopping criterion is reached. Wherein the backward propagation computation of E can comprise a derivative computation with respect to various parameters for machine learning modules in an ensemble of nth machine learning modules (Bengio col. 5, lines 19-52 and col. 6, lines 59-67). The computations being performed for “each iteration of the optimization procedure” (Bengio col. 15, lines 33-39).).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process and systems in the cited reference to include the calculations in Bengio A motivation to combine the cited references with Bengio 
Regarding claim 16, the rejection of claim 1 is incorporated. Faivishevsky teaches: 
The computer-implemented method of claim 1, wherein the first machine learning system comprises a first neural network and the second machine learning system comprises a second neural network ([0021] and [0027]: describing that the first ML network can comprise the deep neural network (DNN) and the second ML network can comprise the recurrent neural network (RNN). This is shown in Fig. 3.).

Regarding claim 17, the rejection of claim 16 is incorporated. Bengio further teaches: 
The computer-implemented method of claim 16, wherein computing partial derivatives of the first objective function and computing partial derivatives of the second objective function each comprise performing a back propagation calculation (Bengio col. 7, lines 55-62: describing back propagation of the partial derivative of the objective function E that can be computed for a nth module. Similarly, see also Bengio col. 5, lines 45-46; col. 6, lines 4-6; and col. 10, lines 60-67: describing various back propagations of the partial derivatives.).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process and systems in the cited reference to include the back propagation in Bengio A motivation to combine the cited references with Bengio was previously given. 

Regarding claim 18, the rejection of claim 1 is incorporated. Bengio further teaches: 
The computer-implemented method of claim 1, wherein computing partial derivatives of the first objective function and computing partial derivatives of the second objective function each comprise performing a numerical estimation calculation (Bengio col. 5, lines 19-52: describing computations of the partial derivatives of the global objective function E that can be calculated for machine learning modules in an ensemble of nth machine learning modules. Wherein the partial derivative of E considers a plurality of input values (Bengio col. 2, lines 52-62; col. 6, lines 40-41; and the previous citations above). Thus, a numerical estimation of the partial derivative of E is being computed via the various input values.).
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process and systems in the cited reference to include the calculations in Bengio A motivation to combine the cited references with Bengio was previously given.

Regarding claim 19, the rejection of claim 1 is incorporated. Faivishevsky teaches: 
The computer-implemented method of claim 1, further comprising updating, by the computer system, learned parameters of the first machine learning system ([0029]-[0030]: describing the process for optimizing the configuration parameters and determining the optimized parameters, wherein once the desired updated configuration parameter is determined, it can be feedback into the ML model comprising the first ML network, i.e. the DNN. See also Figs. 3 and 4: showing the training and optimization process that can comprise looping back the desired updated parameters. The process can comprise an “update rule” as well as a “learning rate” and “learning decay rate” ([0022]).).

Regarding claim 20, Faivishevsky teaches: 
The computer-implemented method of claim 19, wherein the internal characteristic is selected to cause the second machine learning system to alter an output of the first machine ([0029]-[0030]: describing the process for optimizing the configuration parameters and determining the optimized parameters, wherein once the desired updated configuration parameter is determined, it can be feedback into the ML model comprising the first ML network, i.e. the DNN. See also Figs. 3 and 4: showing the training and optimization process that can comprise looping back the desired updated parameters, resulting in an updated output state for the DNN via the updated parameters. The process can comprise an “update rule” as well as a “learning rate” and “learning decay rate” ([0022]).).

Regarding independent claim 21, claim 21 is substantially similar to independent claim 1 and therefore is rejected on the same grounds as claim 1. Claim 21 is a system claim that corresponds to method claim 1. A mapping is shown below for the limitations of claim 21 that differ from claim 1. 
Faivishevsky teaches:
“A computer system for analyzing a first machine learning system via a second machine learning system, the computer system comprising: 
a processor ([0009] and [0012]-[0013]: describing a processor.); and 
a memory coupled to the processor, the memory storing ([0009]: describing computer storage medium with instructions that can be executed by the processor.):…
 instructions that, when executed by the processor, cause the computer system to ([0009]: describing computer storage medium with instructions that can be executed by the processor. Wherein a computing device can comprise a processor ([0012]-[0013]).): ….”

Regarding claim 23, claim 23 is substantially similar to claim 3 and therefore is rejected on the same grounds as claim 3. Claim 23 is a system claim that corresponds to method claim 3.

Regarding claim 24, claim 24 is substantially similar to claim 4 and therefore is rejected on the same grounds as claim 4. Claim 24 is a system claim that corresponds to method claim 4.

Regarding claim 25, claim 25 is substantially similar to claim 5 and therefore is rejected on the same grounds as claim 5. Claim 25 is a system claim that corresponds to method claim 5.

Regarding claim 26, claim 26 is substantially similar to claim 6 and therefore is rejected on the same grounds as claim 6. Claim 26 is a system claim that corresponds to method claim 6.

Regarding claim 34, claim 34 is substantially similar to claim 14 and therefore is rejected on the same grounds as claim 14. Claim 34 is a system claim that corresponds to method claim 14.

Regarding claim 35, claim 35 is substantially similar to claim 15 and therefore is rejected on the same grounds as claim 15. Claim 35 is a system claim that corresponds to method claim 15.

Regarding claim 36, claim 36 is substantially similar to claim 16 and therefore is rejected on the same grounds as claim 16. Claim 36 is a system claim that corresponds to method claim 16
Regarding claim 37, claim 37 is substantially similar to claim 17 and therefore is rejected on the same grounds as claim 17. Claim 37 is a system claim that corresponds to method claim 17.

Regarding claim 38, claim 38 is substantially similar to claim 18 and therefore is rejected on the same grounds as claim 18. Claim 38 is a system claim that corresponds to method claim 18.

Regarding claim 39, claim 39 is substantially similar to claim 19 and therefore is rejected on the same grounds as claim 19. Claim 39 is a system claim that corresponds to method claim 19.

Regarding claim 40, claim 40 is substantially similar to claim 20 and therefore is rejected on the same grounds as claim 20. Claim 40 is a system claim that corresponds to method claim 20.

Claims 2 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Faivishevsky et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2018/0240010, hereinafter Faivishevsky) and Bengio et. al. (U.S. Pat. No. 6,128,606, hereinafter Bengio) in view of Choi et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0155049, hereinafter Choi). 


Regarding claim 2, the rejection of claim 1 is incorporated. While the cited references teach the claim limitations, they do not explicitly teach: “adding, by the computer system, an additional output node to the first machine learning system; and connecting, by the computer system, the additional output node to an input of the second machine learning system”. Choi discloses the claim limitations, teaching: 
“adding, by the computer system, an additional output node to the first machine learning system (Choi [0088]-[0095]: describing the process of selecting, generating, and connecting a new node to a selected node in a neural network (NN), wherein the process can repeated as needed to create additional new nodes.); and
connecting, by the computer system, the additional output node to an input of the second machine learning system (Choi [0138] and [0138]: describing the creation of an additional NN system, e.g. a second or third NN, that is connected to an initial NN system. Wherein the additional NN system can be connected to the initial NN system as shown in Fig. 11B. The initial NN having a plurality of nodes that can comprise newly created nodes as described in the previous citations.).”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process and systems in the cited references to include the additional nodes in Choi. Doing so would enable “a method of extending a structure of a neural network” (Choi [0139]), wherein such extension can result in the generation of a second and a third neural network (Choi [0135] and [0138]).

Regarding claim 22, claim 22 is substantially similar to claim 2 and therefore is rejected on the same grounds as claim 2. Claim 22 is a system claim that corresponds to method claim 2
Claims 7, 9, 10, 12, 27, 29, 30, and 32 are rejected under 35 U.S.C. 103 as being unpatentable over Faivishevsky et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2018/0240010, hereinafter Faivishevsky) and Bengio et. al. (U.S. Pat. No. 6,128,606, hereinafter Bengio) in view of Teig (U.S. Pat. No. 10,586,151, hereinafter Teig). 

Regarding claim 7, the rejection of claim 1 is incorporated. While the cited references teach the claim limitations, they do not explicitly teach: “a derivative calculated with respect to a parameter of the first machine learning system”. Teig discloses the claim limitations, teaching: derivatives of activation functions for nodes in a neural network (Teig col. 15, lines 22-27). Wherein the activation functions can comprise a parameter that is intrinsic to a ML system. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process and systems in the cited reference to include the derivative in Teig. Doing so would enable “a novel method for training a multi-layer node network that mitigates against overfitting the adjustable parameters of the network for a particular problem. During training, the method of some embodiments adjusts the modifiable parameters of the network by iteratively identifying different interior-node….” (Teig Abstract).

Regarding claim 9, the rejection of claim 7 is incorporated. Faivishevsky teaches: 
The computer-implemented method of claim 7, wherein: 
the first machine learning system comprises a neural network comprising an inner node ([0021] and [0026]: describing the layers of the DNN that can comprise a plurality of hidden layers. Wherein it is understood by PHOSITA that each layer of a NN comprises a plurality of nodes. Thus, it is understood that the hidden layers comprise a plurality of hidden nodes.); and ….

While the cited references teach the limitations of claim 9, Teig further teaches:
“the derivative comprises a partial derivative of the first objective function with respect to an activation value of the inner node (Teig col. 15, lines 20-21: describing that “df4(S)/dS represents partial derivative of activation function of node 4”. Wherein node 4 is an inner node as denoted by E4 in Fig. 8.)”.
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process and systems in the cited reference to include the partial derivative in Teig. A motivation to combine the cited references with Teig was previously given. 

Regarding claim 10, the rejection of claim 7 is incorporated. Faivishevsky teaches:
The computer-implemented method of claim 7, wherein: 
the first machine learning system comprises a neural network comprising a node ([0021] and [0026]: describing that the first ML network can be a DNN with a plurality of layers. Wherein it is understood by PHOSITA that each layer of a NN comprises a plurality of nodes.); and …. 



While the cited references teach the limitations of claim 10, Bengio further teaches: 
“the derivative comprises a partial derivative of the first objective function with respect to an input to the node (Bengio col. 5, lines 19-57: describing an objective function E that can be calculated for machine learning modules in an ensemble of nth machine learning modules, wherein a “partial derivative of E” can be computed with regards to the inputs of the modules.)”. 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process and systems in the cited reference to include the partial derivative in Bengio A motivation to combine the cited references with Bengio was previously given. 

Regarding claim 12, the rejection of claim 7 is incorporated. Faivishevsky teaches: 
The computer-implemented method of claim 7, wherein: 
the first machine learning system is one of an ensemble of machine learning systems ([0021] and [0027]: describing that the first ML network, i.e. the DNN, is part of an ensemble of ML model comprising a second ML network, i.e. the RNN. This is shown in Fig. 3.); and 
… ([0021] and [0027]: describing that the first ML network, i.e. the DNN, is part of an ensemble of ML model comprising a second ML network, i.e. the RNN. This is shown in Fig. 3.)



While the cited references teach the limitations of claim 12, Bengio further teaches:
“the derivative comprises a partial derivative of a difference between an output of the first machine learning system and a correct output of the ensemble of machine learning systems (Bengio col. 10, lines 46-65: describing that the objective function E and its partial derivative is based on a “difference between the accumulated penalty of the correct answer, and the negative log-likelihood” of a forward pass of the data. Wherein the computation can be performed for the plurality of ML modules as shown in Figs. 2 and 3. Furthermore, the optimization process of E can continue until “validation” is achieved, denoting that a desired objective is achieved, e.g. desired parameters or performance (Bengio col. 6, lines 4-20).).” 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process and systems in the cited reference to include the partial derivative in Bengio A motivation to combine the cited references with Bengio was previously given. 

Regarding claim 27, claim 27 is substantially similar to claim 7 and therefore is rejected on the same grounds as claim 7. Claim 27 is a system claim that corresponds to method claim 7.

Regarding claim 29, claim 29 is substantially similar to claim 9 and therefore is rejected on the same grounds as claim 9. Claim 29 is a system claim that corresponds to method claim 9.

Regarding claim 30, claim 30 is substantially similar to claim 10 and therefore is rejected on the same grounds as claim 10. Claim 30 is a system claim that corresponds to claim 10.

Regarding claim 32, claim 32 is substantially similar to claim 12 and therefore is rejected on the same grounds as claim 12. Claim 32 is a system claim that corresponds to method claim 12.

Claims 11 and 31 are rejected under 35 U.S.C. 103 as being unpatentable over Faivishevsky et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2018/0240010, hereinafter Faivishevsky) and Bengio et. al. (U.S. Pat. No. 6,128,606, hereinafter Bengio) in view of Teig (U.S. Pat. No. 10,586,151, hereinafter Teig) and Choi et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0155049, hereinafter Choi). 

Regarding claim 11, the rejection of claim 7 is incorporated. Faivishevsky teaches: 
The computer-implemented method of claim 7, wherein: 
the first machine learning system comprises a neural network ([0021] and [0026]: describing that the first ML network can be a DNN with a plurality of layers.); ….

While the cited references teach the limitations of claim 11, Choi further teaches: 
“the neural network comprises a first node, a second node, and an arc connecting the first node and the second node (Choi [0078]: describing input nodes, hidden nodes, and output nodes that can be connected together via “edges having connection weights” as shown in Fig. 1.); and ….”
 Choi. A motivation to combine the cited references with Choi was previously given. 

While the cited references teach the limitations of claim 11, Bengio further teaches: 
“the derivative comprises a partial derivative of the first objective function with respect to a connection weight of the arc (Bengio col. 5, lines 19-57: describing an objective function E that can be calculated for machine learning modules in an ensemble of nth machine learning modules, wherein a “partial derivative of E” can be computed with regards to the tunable parameters of the modules. Wherein the tunable parameters can comprise weights, i.e. connection weights for the nodes, in the machine learning neural network modules (Bengio col, 7, lines 35-36 and col. 9, lines 17-19). The modules along with their use and arc graph representations are shown in Figs. 1A, 1B, 2, and 3.).”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the process and systems in the cited references to include the partial derivative in Bengio. A motivation to combine the cited references with Bengio was previously given. 

Regarding claim 31, claim 31 is substantially similar to claim 11 and therefore is rejected on the same grounds as claim 11. Claim 31 is a system claim that corresponds to method claim 11.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
Gibiansky et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0110657): describing an optimization technique for a machine learning (ML) model, wherein the optimization technique can be used to optimize parameters of a ML model. The optimization technique “determines whether to use a Gaussian, polynomial or linear kernel (first parameter), a margin width (second parameter), and whether to perform bagging (a third parameter)”. The system enables a selection between different ML models and the optimization parameters and techniques that control the ML models’ behavior. 
Stork et. al. (U.S. Pat. No. 5,636,326): describing a technique for optimal weight pruning in artificial neural networks (ANNs) which comprises in-depth computations of partial derivatives for the ANN. The partial derivatives can be calculated in relation to the output of the ANN, as well as its weights and activation functions via the delta rule. The computations of the partial derivatives can be performed by using Hessian type matrices.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELENE A HAEDI whose telephone number is (571)270-5762.  The examiner can normally be reached on M-F 11 AM - 7 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.H./Examiner, Art Unit 2121                                                                                                                                                                                                        




/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121