Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action


1.	The Examiner acknowledges the applicant’s amendment filed February 19, 2021.  At this point claims 1-58, 61-65, 68, 71-78, 84, 92-93 are pending in the instant application and ready for examination by the Examiner.


Allowable Subject Matter
2.	Claims 43-52 and 92-93 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  
	If the applicant should choose to rewrite the independent claims to include the limitations recited in either of claims 43-52 and 92-93, the applicant is encouraged to amend the title of the invention such that it is descriptive of the invention as claimed as required by sec. 606.01 of the MPEP. Furthermore, the Summary of the Invention and the Abstract should be amended to bring them into harmony with the allowed claims as required by paragraph 2 of sec. 1302.01 of the MPEP.


Claim Rejections - 35 USC § 103

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claims 1-6, 8-12, 15-23, 26-27, 29-34, 37-39, 41-42, 53-58, 61, 64-65, 68, 71-72, 75-76 and 78 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra in view of Nugent. (“Elements of artificial neural networks’, referred to as Mehrotra; U. S. Patent Publication 20050015351, referred to as Nugent)

Claim 1
Mehrotra discloses a computer-implemented method for controlling an artificial neural network comprising a first node and a second node, the first and second nodes (Mehrotra, p19, fig 1.13 and 1.14) comprising activation functions that are evaluatable on a dataset (Mehrotra, p19, fig 1.13 and 1.14; EC: The data set is the incoming data into layer 0.) according to an objective defined by an objective function (Mehrotra, p10, Mehrotra, p3; For large applications, the amount of training time is large, requiring several days even on the fastest processors, irrespective of whether the training method is per-epoch or per pattern. The amount of training time can be reduced by exploiting parallelism in per-epoch training, where each of P different processors calculates weight changes for each pattern independently, followed by a phase in which all errors are summed up.)…. between the first node and the second node based at least in part on activation values of the first node and estimates of partial derivatives of the objective with respect to activation values of the second node. (Mehrotra, p72; The error E depends on wk, j ) only through ok , i.e ., no other output term ok,, k' ≠ k contains wk, j . Hence, for the calculations that follow, it is sufficient to restrict attention to the partial derivative of E with respect to ok and then differentiate ok with respect to wk, j EC: The objective is the preferred outcome and thus the error is used to adjust the weights. This happened between the connection between one node to another node.)
Mehrotra does not disclose expressly , an effect on the objective caused by the existence or non-existence of a direct connection…..changing, by the computer system, a structure of the neural network based at least in part on the estimate of the effect.
Nugent discloses an effect on the objective caused by the existence or non-existence of a direct connection (Nugent; 0110-0111; ‘In other words, either there can be a connection or no connection.’ And ‘One method for solving this problem is to utilize two sets of connections for the same output, having one set represent the positive Nugent; 0110; That signal is transferred into the flow of a neurotransmitter whose effect on the receiving neuron can be either excitatory or inhibitory, depending on the neuron, thereby dedicating certain connections inhibitory and excitatory.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 2
Mehrotra discloses computing, by the computer system, a weighted average over the dataset of a product of an activation of the first node (Mehrotra, p10, fig 1.4) and a partial derivative of the objective with respect to an input to the second node for each data item in the dataset. (Mehrotra, p72; The error E depends on wk, j ) only through ok , i.e ., no other output term ok,, k' ≠ k contains wk, j . Hence, for the calculations that follow, it is sufficient to restrict attention to the partial derivative of E with respect to ok and then differentiate ok with respect to wk, j EC: The objective is the preferred outcome and thus 

Claim 3
Mehrotra discloses wherein the dataset comprises a development dataset set (Mehrotra, p51; In some problems, the input dimensions are non-numeric, and their values do not have any inherent order. For instance, the input dimension may be "color," and its values may range over the set {red, blue, green, yellow} .) aside from a training dataset on which the neural network was trained. (Mehrotra, p25; We are provided with a "training set" consisting of sample patterns that are representative of all classes, along with class membership information for each pattern. Using the training set, we deduce rules for membership in each class and create a classifier, which can then be used to assign other patterns to their respective classes according to these rules.)

Claim 4
Mehrotra does not disclose expressly the first node and the second node are not directly connected; estimating the effect on the objective comprises estimating, by the computer system, the effect on the objective of adding the direct connection between the first node and the second node; and changing the structure of the neural network comprises adding, by the computer system, the direct connection between the first node and the second node based at least in part on whether an estimate of the effect on the 
Nugent discloses the first node and the second node are not directly connected (Nugent; 0107; The connection possesses a resistance somewhere between a minimum intrinsic resistance (maximum particles bridging gap, nanowire cross junctions closed, or conducting states of molecular switches) and a maximum intrinsic resistance ( no particles bridging gap, no nanowire cross junctions open, or no non-conducting molecular switch states).); estimating the effect on the objective comprises estimating, by the computer system, the effect on the objective of adding the direct connection between the first node and the second node; and changing the structure of the neural network comprises adding, by the computer system, the direct connection between the first node and the second node based at least in part on whether an estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to the existence of the direct connection. (Nugent; 0193; Each Known synapse can be composed of connection conduits, separated by a characteristic distance "d", where each connection conduit is the result of nano-particles aligning in an electric field generated by the temporal and sequential firing of the coupled base neurons (i.e., see schematic diagram 1756).) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a 

Claim 5
Mehrotra does not disclose expressly the first node and the second node are directly connected; estimating the effect on the objective comprises estimating, by the computer system, the effect on the objective of deleting the direct connection between the first node and the second node; and changing the structure of the neural network comprises deleting, by the computer system, the direct connection between the first node and the second node based at least in part on whether an estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to the non-existence of the direct connection.
Nugent discloses the first node and the second node are directly connected; estimating the effect on the objective comprises estimating, by the computer system, the effect on the objective of deleting the direct connection between the first node and the second node; and changing the structure of the neural network comprises deleting, by the computer system, the direct connection between the first node and the second node based at least in part on whether an estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to the non-existence of the direct connection. (Nugent, 0126; Thus, as indicated at block 1010, as the electric field is applied across the connection gap, the more the nonconductor(s) will align and the stronger the connection becomes. Connections (i.e., synapses) that are not used are dissolved back into the solution, as "if you do not use the connection, it will fade away," much like the connections between neurons in a human brain in response to Long Term Depression, or LTD.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 6
Mehrotra does not disclose expressly wherein deleting the direct connection between the first node and the second node is further based at least in part on a magnitude of a connection weight associated with the direct connection between the first node and the second node.
Nugent discloses wherein deleting the direct connection between the first node and the second node is further based at least in part on a magnitude of a connection weight associated with the direct connection between the first node and the second node. (Nugent, 0126; Connections (i.e., synapses) that are not used are dissolved back into the solution, as illustrated at block 1012. As illustrated at block 1014, the resistance of the connection can be maintained or lowered by selective activations of the connections. In other words, "if you do not use the connection, it will fade away," much 

Claim 8
Mehrotra discloses wherein the neural network comprises a weighted directed acyclic graph. (Mehrotra, 18-20; 1 .3.3 Acyclic networks There is a subclass of layered networks in which there are no intra-layer connections, as shown in figure 1 .14. In other words, a connection may exist between any node in layer I and any node in layer j for i < j, but a connection is not allowed for i = j . The computational processes in acyclic networks are much simpler than those in networks with exhaustive, cyclic, or inter-layer connections . Networks that are not acyclic are referred to as recurrent networks.)

Claim 9
Mehrotra discloses wherein the neural network comprises an artificial a recurrent neural network. (Mehrotra, p136; 4.3.1 Recurrent networks Recurrent neural networks contain connections from output nodes to hidden layer and/or input layer nodes, and they allow interconnections between nodes of the same layer, particularly between the 

Claim 10
Mehrotra discloses wherein neural network comprises a layered feed-forward neural network. (Mehrotra, p20; 1.3.4 Feedforward networks This is a subclass of acyclic networks in which a connection is allowed from a node in layer i only to nodes in layer i + 1, as shown in figure 1 .15. These networks are succinctly described by a sequence of numbers indicating the number of nodes in each layer. For instance, the network shown in figure 1 .15 is a 3-2-3-2 feedforward network; it contains three nodes in the input layer (layer 0), two nodes in the first hidden layer (layer 1), three nodes in the second hidden layer (layer 2), and two nodes in the output layer (layer 3).)

Claim 11
Mehrotra discloses the first node is located in a first layer of the layered feed-forward neural network and the second node is located in a second layer of the layered feed-forward neural network. (Mehrotra, p20, fig 1.15; First node in first layer maps to any node in layer 1. The second node in the second layer maps to any node in layer 2.)
Mehrotra does not disclose expressly; and the first node and the second node are not directly connected. 
Nugent; 0107; The connection possesses a resistance somewhere between a minimum intrinsic resistance (maximum particles bridging gap, nanowire cross junctions closed, or conducting states of molecular switches) and a maximum intrinsic resistance ( no particles bridging gap, no nanowire cross junctions open, or no non-conducting molecular switch states).) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 12
Mehrotra discloses the neural network comprises a first subnetwork and a second subnetwork; the first node is located in the first subnetwork; the second node is located in the second subnetwork. (Mehrotra, p20, fig 1.15; First node in first subnetwork maps to any node in layer 1. The second node in the second subnetwork maps to any node in layer 2.)
Mehrotra does not disclose expressly changing the structure of the neural network comprises adding, by the computer system, the direct connection from the first node to the second node based at least in part on whether an estimate of the effect on 
Nugent discloses changing the structure of the neural network comprises adding, by the computer system, the direct connection from the first node to the second node based at least in part on whether an estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to the existence of the direct connection. (Nugent; 0107; The connection possesses a resistance somewhere between a minimum intrinsic resistance (maximum particles bridging gap, nanowire cross junctions closed, or conducting states of molecular switches) and a maximum intrinsic resistance ( no particles bridging gap, no nanowire cross junctions open, or no non-conducting molecular switch states).  EC: With the ability to make connections or break connections, Nugent can alter the schematic design of the network into an ‘improved’ design. With altering the strength of the electrical field, Nugent can alter the resistance between the nodes which maps to a software designed ‘weight’ value.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 15

Nugent discloses the first node and the second node form a cover pair in a strict partial order corresponding to a transitive closure of the neural network; and changing the structure of the neural network comprises deleting a first direct connection from the first node to the second node and adding a second direct connection from the second node to the first node based at least in part on whether an estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to replacement of the first direct connection with the second direct connection. (Nugent; 0107; The connection possesses a resistance somewhere between a minimum intrinsic resistance (maximum particles bridging gap, nanowire cross junctions closed, or conducting states of molecular switches) and a maximum intrinsic resistance ( no particles bridging gap, no nanowire cross junctions open, or no non-conducting molecular switch states).  EC: The ‘covered pair’ maps to the junctions. ‘Transitive closure’ is the potential connection between them with the nanoparticles.  With the ability to make connections or break connections, Nugent can alter the schematic design of the network into an ‘improved’ design. With altering the strength of 

Claim 16
Mehrotra does not disclose expressly freezing, by the computer system, the direct connection between the first node and the second node such that while the direct connection is frozen a connection weight of the direct connection is not changed during training of the neural network.
Nugent discloses freezing, by the computer system, the direct connection between the first node and the second node such that while the direct connection is frozen a connection weight of the direct connection is not changed during training of the neural network. (Nugent, 0126; Connections (i.e., synapses) that are not used are dissolved back into the solution, as illustrated at block 1012. As illustrated at block 1014, the resistance of the connection can be maintained or lowered by selective activations of the connections. In other words, "if you do not use the connection, it will fade away," much like the connections between neurons in a human brain in response to Long Term Depression, or LTD. EC: this discloses as long and the electrical field is 

Claim 17
Mehrotra does not disclose expressly wherein the connection weight of the frozen direct connection is zero.
Nugent discloses wherein the connection weight of the frozen direct connection is zero. (Nugent, 0166; Since no connections are present, the voltage at neurons A, B, C and D are all zero and consequently all neurons output zero.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 18

Nugent discloses wherein the connection weight of the frozen direct connection is non-zero. (Nugent, 0239; Second, the lowered potential causes an increase in the electric field across all connection in a connection network currently activating the neuron. In other words, during the time of the refractory pulse, all the connections that are coming from firing neurons become stronger. EC: If a connection is made, then it is non-zero.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 19
Mehrotra does not disclose expressly unfreezing, by the computer system, the frozen direct connection.
Nugent discloses unfreezing, by the computer system, the frozen direct connection. (Nugent, 0126; Connections (i.e., synapses) that are not used are dissolved back into the solution, as illustrated at block 1012. As illustrated at block 1014, the resistance of the connection can be maintained or lowered by selective activations of the connections. In other words, "if you do not use the connection, it will fade away," much like the connections between neurons in a human brain in response to Long Term Depression, or LTD.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 20
Mehrotra discloses wherein the dataset over which the neural network is evaluated comprises a full batch of a training dataset. (Mehrotra, p80; 2. In "per-epoch" (or "batch-mode") learning, weights are updated only after all samples are presented to the network.)

Claim 21
Mehrotra discloses the dataset over which the neural network is evaluated comprises a mini-batch of a training dataset; and estimating the effect on the objective comprises estimating, by the computer system, a gradient of the objective function for stochastic gradient descent. (Mehrotra, p43; Rosenblatt (1958) defines a perceptron to be a machine that learns, using examples, to assign input vectors (samples) to different classes, using a linear function of the inputs. Minsky and Papert (1969) instead describe the perceptron as a stochastic gradient-descent algorithm that attempts to linearly 

Claim 22
Mehrotra discloses wherein the dataset comprises a first dataset, the method further comprising: assigning, by the computer system, a data influence weight to each data item in the first dataset (Mehrotra, p37, p43; ‘It is not surprising for a system to perform well on the data on which it has been trained. But good generalizability is also necessary, i .e ., the system must perform well on new test data distinct from training data . Consider a child learning addition of one digit numbers.’ And ‘Such a perceptron can be represented by a single node that applies a step function to the net weighted sum of its inputs . The input pattern is considered to belong to one class or the other depending on whether the node output is 0 or 1.’ EC: first dataset maps to training data.); training, by the computer system, the neural network on the first dataset via stochastic gradient descent, which comprises: computing, by the computer system, a weighted average of an estimate of a gradient in each stochastic gradient descent update according to the data influence weight for each data item in the first dataset; (Mehrotra, p10, fig 1.4, p43; Rosenblatt (1958) defines a perceptron to be a machine that learns, using examples, to assign input vectors (samples) to different classes, using a linear function of the inputs. Minsky and Papert (1969) instead describe the perceptron as a stochastic gradient-descent algorithm that attempts to linearly separate a set of n-dimensional training data.) measuring, by the computer system during training of the neural network, a performance of the neural network on a second dataset; Mehrotra, p37; It is not surprising for a system to perform well on the data on which it has been trained. But good generalizability is also necessary, i .e ., the system must perform well on new test data distinct from training data . Consider a child learning addition of one digit numbers. EC: Second dataset maps to testing data.); and adjusting, by the computer system during training of the neural network, the data influence weight of one or more data items in the first dataset based on the performance of the neural network. (Mehrotra, p33-34; If the system behavior changes with time, the same network that is used to generate inputs for the system may also be continually trained on-line, i .e., its weights are adjusted depending on the error measure N(i) — N(S(N(i))), the deviation between the network outputs N(i) and the result of applying the network to the system's output when applied to the network's outputs.)

Claim 23
Mehrotra does not disclose expressly wherein adjusting the data influence weight comprises setting, by the computer system, the data influence weight to zero.
Nugent discloses wherein adjusting the data influence weight comprises setting, by the computer system, the data influence weight to zero. (Nugent, 0166; Since no connections are present, the voltage at neurons A, B, C and D are all zero and consequently all neurons output zero.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing 

Claim 26
Mehrotra discloses training, by the computer system, the second node to match an output of the first node. (Mehrotra, p20, fig 1.15; One node from the second layer (second node) is connected to one node from the first layer (first node). EC: There is no specific meaning within the specification of what is meant by ‘match.’)

Claim 27
Mehrotra discloses training, by the computer system, the first node to maximize a magnitude of a correlation between the activation function of the first node and a partial derivative of the objective function with respect to an input to the second node. (Mehrotra, p72; The error E depends on wk, j ) only through ok , i.e ., no other output term ok,, k' ≠ k contains wk, j . Hence, for the calculations that follow, it is sufficient to restrict attention to the partial derivative of E with respect to ok and then differentiate ok with respect to wk, j EC: The objective is the preferred outcome and thus the error is used to adjust the weights. This happened between the connection between one node to another node.)

Claim 29
Mehrotra, p20, fig 1.15; First node in first subnetwork maps to any node in layer 1. The second node in the second subnetwork maps to any node in layer 2.  The weights associated with the first layer and the weights associated with the second layer disclose different learning tasks.)

Claim 30
Mehrotra discloses inputting, by the computer system, the dataset to the first subnetwork; and inputting, by the computer system, a subset of the dataset to the second subnetwork. (Mehrotra, p20, fig 1.13; Here Mehrotra discloses some data going directly into layer 2 (second subnetwork).)

Claim 31
Mehrotra discloses training, by the computer system, the first subnetwork on the dataset; and training, by the computer system, the second subnetwork on a subset of the dataset. (Mehrotra, p46-48; Every node with input and output can be seen as a perceptron. Regardless of the location of the perceptron, the training remains the same.)

Claim 32
Mehrotra, p20, fig 1.15; A neural network is a classifier. The number of nodes on each layer can be viewed as a ‘category.’) 

Claim 33
Mehrotra discloses wherein the first subnetwork is evaluatable based on a first objective and the second subnetwork is evaluatable based on a second objective, the first objective corresponding to a plurality of categories and the second objective corresponding to a subset of the plurality of categories. (Mehrotra, p20, fig 1.15; A neural network is a classifier. Depending how is it designed, dictates the objective. It could be a simple classification between a plane to a helicopter, or classifying a living sample among the taxonomic rank.)

Claim 34
Mehrotra discloses wherein the first subnetwork and the second subnetwork operate asynchronously, further comprising: monitoring, by the computer system, an individual performance of each of the first subnetwork and the second subnetwork via a machine learning coach executed by the computer system; and changing, by the computer system, the structure of at least one of the first subnetwork or the second subnetwork to improve a combined performance of the first subnetwork and the second subnetwork. (Mehrotra, p46-48; Every node with input and output can be seen as a 

Claim 37
Mehrotra discloses the dataset comprises a first dataset (Mehrotra, p37, p43; ‘It is not surprising for a system to perform well on the data on which it has been trained. But good generalizability is also necessary, i .e ., the system must perform well on new test data distinct from training data . Consider a child learning addition of one digit numbers.’ And ‘Such a perceptron can be represented by a single node that applies a step function to the net weighted sum of its inputs. The input pattern is considered to belong to one class or the other depending on whether the node output is 0 or 1.’ EC: first dataset maps to training data.); and detecting the problem in the learning process of the neural network comprises detecting, by the computer system, whether a difference between a performance of the neural network with respect to the objective on the first dataset and the performance of the neural network with respect to the objective on a second dataset that is disjoint from the first dataset exceeds a threshold value via the learning coach machine learning system. (Mehrotra, p86-88; A rule of thumb, obtained from related statistical problems, is to have at least five to ten times as many training samples as the number of weights to be trained . Baum and Haussler (1989) suggest the following number, on the basis of the desired accuracy on the test set: P > I W I / (1 —a) where P denotes the (desired) number of patterns (i .e., the size of the training set), W denotes the number of weights to be trained, and a denotes the expected accuracy on the test set. Thus, if a network contains 27 weights and the desired test set 

Claim 38
Mehrotra discloses wherein detecting the problem in the learning process of the neural network comprises: detecting, by the computer system, whether the neural network misclassifies a particular data item of the dataset over a plurality of training epochs via the learning coach machine learning system. (Mehrotra, p62; Evaluate the performance using the following measures. a. Number of iterations in training b. Amount of computation time c. Number of misclassifications d. Mean squared error,)

Claim 39
Mehrotra discloses wherein detecting the problem in the learning process of the neural network comprises: detecting, by the computer system via the learning coach machine learning system, whether the neural network classifies a plurality of data items of the dataset into a single category, wherein the classified plurality of data items are designated to be classified into a plurality of categories. . (Mehrotra, p37; The nature of the problem sometimes dictates the choice of the error measure . In classification problems, in addition to the Euclidean distance, another possible error measure is the fraction of misclassified samples.
E = Number of misclassified samples / Total number of samples 


Claim 41
Mehrotra does not disclose expressly wherein correcting the problem detected by the learning coach machine learning system comprises: adding or deleting, by the computer system, a connection between the first node and the second node.
Nugent discloses wherein correcting the problem detected by the learning coach machine learning system comprises: adding or deleting, by the computer system, a connection between the first node and the second node. (Nugent, 0126; Thus, as indicated at block 1010, as the electric field is applied across the connection gap, the more the nonconductor(s) will align and the stronger the connection becomes. Connections (i.e., synapses) that are not used are dissolved back into the solution, as illustrated at block 1012. As illustrated at block 1014, the resistance of the connection can be maintained or lowered by selective activations of the connections. In other words, "if you do not use the connection, it will fade away," much like the connections between neurons in a human brain in response to Long Term Depression, or LTD. EC: If they remained used, the connections remain.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a 

Claim 42
Mehrotra does not disclose expressly wherein correcting the problem detected by the learning coach machine learning system comprises: unfreezing, by the computer system, a connection weight between the first node and the second node.
Nugent discloses wherein correcting the problem detected by the learning coach machine learning system comprises: unfreezing, by the computer system, a connection weight between the first node and the second node. (Nugent, 0126; Connections (i.e., synapses) that are not used are dissolved back into the solution, as illustrated at block 1012. As illustrated at block 1014, the resistance of the connection can be maintained or lowered by selective activations of the connections. In other words, "if you do not use the connection, it will fade away," much like the connections between neurons in a human brain in response to Long Term Depression, or LTD.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 53
Mehrotra, p3; For large applications, the amount of training time is large, requiring several days even on the fastest processors, irrespective of whether the training method is per-epoch or per pattern. The amount of training time can be reduced by exploiting parallelism in per-epoch training, where each of P different processors calculates weight changes for each pattern independently, followed by a phase in which all errors are summed up.): the neural network comprising a first node and a second node (Mehrotra, p19, fig 1.13 and 1.14), the first and second nodes comprising activation functions (Mehrotra, p10, fig 1.4; f(w1x1 …. Wnxn) EC: This function occurs between every edge and associated nodes.)  that are evaluatable on a dataset according to an objective defined by an objective function (Mehrotra, p19, fig 1.13 and 1.14; EC: The data set is the incoming data into layer 0.); and…. based at least in part on activation values of the first node and estimates of partial derivatives of the objective with respect to activation values of the second node. (Mehrotra, p72; The error E depends on wk, j ) only through ok , i.e ., no other output term ok,, k' ≠ k contains wk, j . Hence, for the calculations that follow, it is sufficient to restrict attention to the partial derivative of E with respect to ok and then differentiate ok with respect to wk, j EC: The objective is the preferred outcome and thus the error is used to adjust the weights. This happened between the connection between one node to another node.)
Mehrotra does not disclose expressly instructions that, when executed by the processor, cause the computer system to: estimate an effect on the objective caused by the existence or nonexistence of a direct connection between the first node and the 
Nugent discloses instructions that, when executed by the processor, cause the computer system to: estimate an effect on the objective caused by the existence or nonexistence of a direct connection between the first node and the second node (Nugent; 0110-0111; ‘In other words, either there can be a connection or no connection.’ And ‘One method for solving this problem is to utilize two sets of connections for the same output, having one set represent the positive connections and the other set represent the negative connections. The output of these two layers can be compared, and the layer with the greater output will output either a high signal or a low signal, depending on the type of connection set (inhibitory or excitatory).’) ….change a structure of the neural network based at least in part on whether the direct connection between the first node and the second node exists. (Nugent; 0110; That signal is transferred into the flow of a neurotransmitter whose effect on the receiving neuron can be either excitatory or inhibitory, depending on the neuron, thereby dedicating certain connections inhibitory and excitatory.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Mehrotra discloses wherein the memory stores instructions that cause the computer system to estimate the effect on the objective by computing a weighted average over the dataset of a product of an activation of the first node (Mehrotra, p10, fig 1.4) and a partial derivative of the objective with respect to an input to the second node for each data item in the dataset. (Mehrotra, p72; The error E depends on wk, j ) only through ok , i.e ., no other output term ok,, k' ≠ k contains wk, j . Hence, for the calculations that follow, it is sufficient to restrict attention to the partial derivative of E with respect to ok and then differentiate ok with respect to wk, j EC: The objective is the preferred outcome and thus the error is used to adjust the weights. This happened between the connection between one node to another node.)

Claim 55
Mehrotra discloses wherein the dataset comprises a development dataset set (Mehrotra, p51; In some problems, the input dimensions are non-numeric, and their values do not have any inherent order. For instance, the input dimension may be "color," and its values may range over the set {red, blue, green, yellow} .) aside from a training dataset on which the neural network was trained. (Mehrotra, p25; We are provided with a "training set" consisting of sample patterns that are representative of all classes, along with class membership information for each pattern . Using the training set, we deduce rules for membership in each class and create a classifier, which can then be used to assign other patterns to their respective classes according to these rules.)

Claim 56
Mehrotra does not disclose expressly the first node and the second node are not directly connected; and the memory stores instructions that cause the computer system to: estimate the effect on the objective by estimating the effect on the objective of adding the direct connection between the first node and the second node; and change the structure of the neural network by adding the direct connection between the first node and the second node based at least in part on whether the estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to the existence of the direct connection.
Nugent discloses the first node and the second node are not directly connected (Nugent; 0107; The connection possesses a resistance somewhere between a minimum intrinsic resistance (maximum particles bridging gap, nanowire cross junctions closed, or conducting states of molecular switches) and a maximum intrinsic resistance ( no particles bridging gap, no nanowire cross junctions open, or no non-conducting molecular switch states).); and the memory stores instructions that cause the computer system to: estimate the effect on the objective by estimating the effect on the objective of adding the direct connection between the first node and the second node; and change the structure of the neural network by adding the direct connection between the first node and the second node based at least in part on whether the estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to the existence of the direct connection. (Nugent; 0193; Each Known synapse can be composed of connection conduits, 

Claim 57
Mehrotra does not disclose expressly the first node and the second node are directly connected; the memory stores instructions that cause the computer system to estimate the effect on the objective by estimating the effect on the objective of deleting the direct connection between the first node and the second node; and change the structure of the neural network by deleting the direct connection between the first node and the second node based at least in part on whether an estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to the non-existence of the direct connection.
Nugent discloses the first node and the second node are directly connected; the memory stores instructions that cause the computer system to estimate the effect on the objective by estimating the effect on the objective of deleting the direct connection between the first node and the second node; and change the structure of the neural Nugent, 0126; Thus, as indicated at block 1010, as the electric field is applied across the connection gap, the more the nonconductor(s) will align and the stronger the connection becomes. Connections (i.e., synapses) that are not used are dissolved back into the solution, as illustrated at block 1012. As illustrated at block 1014, the resistance of the connection can be maintained or lowered by selective activations of the connections. In other words, "if you do not use the connection, it will fade away," much like the connections between neurons in a human brain in response to Long Term Depression, or LTD.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 58
Mehrotra does not disclose expressly wherein the memory stores instructions that cause the computer system to delete the direct connection between the first node and the second node further based at least in part on a magnitude of a connection 
Nugent discloses wherein the memory stores instructions that cause the computer system to delete the direct connection between the first node and the second node further based at least in part on a magnitude of a connection weight associated with the direct connection between the first node and the second node. (Nugent, 0126; Connections (i.e., synapses) that are not used are dissolved back into the solution, as illustrated at block 1012. As illustrated at block 1014, the resistance of the connection can be maintained or lowered by selective activations of the connections. In other words, "if you do not use the connection, it will fade away," much like the connections between neurons in a human brain in response to Long Term Depression, or LTD.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 61
Mehrotra discloses the neural network comprises a first subnetwork and a second subnetwork; the first node is located in the first subnetwork; the second node is located in the second subnetwork. (Mehrotra, p20, fig 1.15; First node in first 
Mehrotra does not disclose expressly the memory stores instructions that cause the computer system to change the structure of the neural network by adding the direct connection from the first node to the second node based at least in part on whether an estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to the existence of the direct connection.
Nugent discloses the memory stores instructions that cause the computer system to change the structure of the neural network by adding the direct connection from the first node to the second node based at least in part on whether an estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to the existence of the direct connection. (Nugent; 0107; The connection possesses a resistance somewhere between a minimum intrinsic resistance (maximum particles bridging gap, nanowire cross junctions closed, or conducting states of molecular switches) and a maximum intrinsic resistance ( no particles bridging gap, no nanowire cross junctions open, or no non-conducting molecular switch states).  EC: With the ability to make connections or break connections, Nugent can alter the schematic design of the network into an ‘improved’ design. With altering the strength of the electrical field, Nugent can alter the resistance between the nodes which maps to a software designed ‘weight’ value.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify 

Claim 64
Mehrotra does not disclose expressly the first node and the second node form a cover pair in a strict partial order corresponding to a transitive closure of the neural network; and the memory stores instructions that cause the computer system to change the structure of the neural network by deleting a first direct connection from the first node to the second node and adding a second direct connection from the second node to the first node based at least in part on whether an estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to replacement of the first direct connection with the second direct connection.
Nugent discloses the first node and the second node form a cover pair in a strict partial order corresponding to a transitive closure of the neural network; and the memory stores instructions that cause the computer system to change the structure of the neural network by deleting a first direct connection from the first node to the second node and adding a second direct connection from the second node to the first node based at least in part on whether an estimate of the effect on the objective indicates improvement in a performance of the neural network with respect to the objective due to replacement of the first direct connection with the second direct Nugent; 0107; The connection possesses a resistance somewhere between a minimum intrinsic resistance (maximum particles bridging gap, nanowire cross junctions closed, or conducting states of molecular switches) and a maximum intrinsic resistance ( no particles bridging gap, no nanowire cross junctions open, or no non-conducting molecular switch states).  EC: The ‘covered pair’ maps to the junctions. ‘Transitive closure’ is the potential connection between them with the nanoparticles.  With the ability to make connections or break connections, Nugent can alter the schematic design of the network into an ‘improved’ design. With altering the strength of the electrical field, Nugent can alter the resistance between the nodes which maps to a software designed ‘weight’ value.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 65
Mehrotra does not disclose expressly wherein the memory stores instructions that cause the computer system to freeze the direct connection between the first node and the second node such that while the direct connection is frozen a connection weight of the direct connection is not changed during training of the neural network.
Nugent, 0126; Connections (i.e., synapses) that are not used are dissolved back into the solution, as illustrated at block 1012. As illustrated at block 1014, the resistance of the connection can be maintained or lowered by selective activations of the connections. In other words, "if you do not use the connection, it will fade away," much like the connections between neurons in a human brain in response to Long Term Depression, or LTD. EC: this discloses as long and the electrical field is maintained, it is ‘frozen.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 68
Mehrotra does not disclose expressly wherein the memory stores instructions that cause the computer system to unfreeze the frozen direct connection.
Nugent discloses wherein the memory stores instructions that cause the computer system to unfreeze the frozen direct connection. (Nugent, 0126; Connections (i.e., synapses) that are not used are dissolved back into the solution, as illustrated at "if you do not use the connection, it will fade away," much like the connections between neurons in a human brain in response to Long Term Depression, or LTD.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 71
Mehrotra discloses the dataset comprises a first dataset (Mehrotra, p37; It is not surprising for a system to perform well on the data on which it has been trained. But good generalizability is also necessary, i .e ., the system must perform well on new test data distinct from training data . Consider a child learning addition of one digit numbers. EC: First dataset maps to training data.); and the memory stores instructions that cause the computer system to: assign a data influence weight to each data item in the first dataset; train the neural network on the first dataset via stochastic gradient descent (Mehrotra, p43; Such a perceptron can be represented by a single node that applies a step function to the net weighted sum of its inputs . The input pattern is considered to belong to one class or the other depending on whether the node output is 0 or 1.), which comprises: computing a weighted average of an estimate of a gradient in each Mehrotra, p10, fig 1.4, p43; Rosenblatt (1958) defines a perceptron to be a machine that learns, using examples, to assign input vectors (samples) to different classes, using a linear function of the inputs. Minsky and Papert (1969) instead describe the perceptron as a stochastic gradient-descent algorithm that attempts to linearly separate a set of n-dimensional training data.) measure, during training of the neural network, a performance of the neural network on a second dataset (Mehrotra, p33-34; If the system behavior changes with time, the same network that is used to generate inputs for the system may also be continually trained on-line, i .e., its weights are adjusted depending on the error measure N(i) — N(S(N(i))), the deviation between the network outputs N(i) and the result of applying the network to the system's output when applied to the network's outputs.); wherein the second dataset is disjoint from the first dataset (Mehrotra, p37; It is not surprising for a system to perform well on the data on which it has been trained. But good generalizability is also necessary, i .e ., the system must perform well on new test data distinct from training data . Consider a child learning addition of one digit numbers. EC: Second dataset maps to testing data.); and adjust, during training of the neural network, the data influence weight of one or more data items in the first dataset based on the performance of the neural network. (Mehrotra, p56; This approach, called the pocket algorithm with ratchet, ensures that the pocket weights always "ratchet up" : a set of weights w l in the pocket is replaced by a set of weights w2 with a longer successful run only after testing (on all training samples) whether w2 does correctly classify a greater number of samples than w l .)


Mehrotra does not disclose expressly wherein the memory stores instructions that cause the computer system to adjust the data influence weight by setting the data influence weight to zero.
Nugent discloses wherein the memory stores instructions that cause the computer system to adjust the data influence weight by setting the data influence weight to zero. (Nugent, 0166; Since no connections are present, the voltage at neurons A, B, C and D are all zero and consequently all neurons output zero.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 75
Mehrotra discloses wherein the memory stores instructions that cause the computer system to train the second node to match an output of the first node. (Mehrotra, p20, fig 1.15; One node from the second layer (second node) is connected to one node from the first layer (first node). EC: There is no specific meaning within the specification of what is meant by ‘match.’)

Claim 76
Mehrotra, p72; The error E depends on wk, j ) only through ok , i.e ., no other output term ok,, k' ≠ k contains wk, j . Hence, for the calculations that follow, it is sufficient to restrict attention to the partial derivative of E with respect to ok and then differentiate ok with respect to wk, j EC: The objective is the preferred outcome and thus the error is used to adjust the weights. This happened between the connection between one node to another node.)

Claim 78
Mehrotra disclose the neural network comprises: a first subnetwork for performing a first machine learning task; and a second subnetwork for performing a second machine learning task that is distinct from the first machine learning task; the first node is located in the first subnetwork; and the second node is located in the second subnetwork. (Mehrotra, p20, fig 1.15; First node in first subnetwork maps to any node in layer 1. The second node in the second subnetwork maps to any node in layer 2.  The weights associated with the first layer and the weights associated with the second layer disclose different learning tasks.)


Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra and Nugent as applied to claims 1-6, 8-12, 15-23, 26-27, 29-34, 37-39, 41-42,  above, and further in view of Thibadeau. (U. S. Patent Publication 20050259895, referred to as Thibadeau) 

Claim 7
Mehrotra and Nugent do not disclose expressly wherein the neural network comprises a strict partially ordered set.
Thibadeau discloses wherein the neural network comprises a strict partially ordered set. (Thibadeau, 0021; The script (ordered sequence of commands and information) defines a strict partially ordered set.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Thibadeau before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate a partially ordered set of Thibadeau. Given the advantage of a physical neural network must have instructions in sequence to parallel the sequence of operations of a neural network, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Claim(s) 13-14, 35-36, 62-63 and 84 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra and Nugent as applied to claims 1-6, 8-12, 15-23, 26-27, 29-34, 37-39, 41-42, 53-58, 61, 64-65, 68, 71-72, 75-76 and 78 above, and further in view of Mandel. (U. S. Patent Publication 20170223190, referred to as Mandel) 

Claim 13

Mandel discloses wherein changing the structure of the neural network is controlled by a machine learning system executed by the computer system. (Mandel, 0121; Note that an output of the first machine learning system is used to train the second machine learning system.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Mandel before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate a learning system of Mandel. Given the advantage of a learning system is dynamic and adjustable and can be used as a trainer for another system, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 14
Mehrotra and Nugent do not disclose expressly wherein the machine learning system comprises a learning coach machine learning system.
Mandel discloses wherein the machine learning system comprises a learning coach machine learning system. (Mandel, 0121; Note that an output of the first machine learning system is used to train the second machine learning system.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Mandel before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate a learning system of Mandel. Given the advantage of a learning system is dynamic and adjustable and can be used as a trainer 

Claim 35
Mehrotra discloses training, by the computer system, the neural network on the dataset according to the objective. (Mehrotra, p33-34; If the system behavior changes with time, the same network that is used to generate inputs for the system may also be continually trained on-line, i .e., its weights are adjusted depending on the error measure N(i) — N(S(N(i))), the deviation between the network outputs N(i) and the result of applying the network to the system's output when applied to the network's outputs.)
Mehrotra does not disclose expressly detecting, by the computer system, a problem in a learning process of the neural network during training of the neural network.
Nugent discloses detecting, by the computer system, a problem in a learning process of the neural network during training of the neural network. (Nugent, 0106; Basically, the connections in a connection network must be able to change in accordance with the feedback provided.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a 
Mehrotra and Nugent do not disclose expressly via a learning coach machine learning system; and correcting, by the computer system, the problem detected by the learning coach machine learning system.
Mandel discloses via a learning coach machine learning system; and correcting, by the computer system, the problem detected by the learning coach machine learning system. (Mandel, 0123; In a Determine Step 365, the second machine learning system is optionally used to determine if the second customer service inquiry should be provided to the first machine learning system for generation of at least a partial response to the second customer service inquiry. For example, once trained the second machine learning system may be used to determine acceptance and/or routing of additional customer service inquires.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Mandel before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate a learning system of Mandel. Given the advantage of a learning system is dynamic and adjustable and can be used as a trainer for another system, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 36
Mehrotra discloses wherein detecting the problem in the learning process of the neural network comprises: detecting, by the computer system, whether a magnitude of Mehrotra, p43-44; Minsky and Papert (1969) instead describe the perceptron as a stochastic gradient-descent algorithm that attempts to linearly separate a set of n-dimensional training data . An example of a threshold is ‘output = 1 if (x1 –x2) > 2.’)

Claim 62
Mehrotra and Nugent do not disclose expressly the memory stores a machine learning system for controlling the neural network; and the memory stores instructions that cause the computer system to change the structure of the neural network according to the machine learning system.
Mandel discloses the memory stores a machine learning system for controlling the neural network; and the memory stores instructions that cause the computer system to change the structure of the neural network according to the machine learning system. (Mandel, 0121; Note that an output of the first machine learning system is used to train the second machine learning system.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Mandel before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate a learning system of Mandel. Given the advantage of a learning system is dynamic and adjustable and can be used as a trainer for another system, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Mehrotra and Nugent do not disclose expressly wherein the machine learning system comprises a learning coach machine learning system.
Mandel discloses wherein the machine learning system comprises a learning coach machine learning system. (Mandel, 0121; Note that an output of the first machine learning system is used to train the second machine learning system.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Mandel before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate a learning system of Mandel. Given the advantage of a learning system is dynamic and adjustable and can be used as a trainer for another system, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 84
Mehrotra discloses wherein the memory stores instructions that cause the computer system to: train the neural network on the dataset according to the objective. (Mehrotra, p33-34; If the system behavior changes with time, the same network that is used to generate inputs for the system may also be continually trained on-line, i .e., its weights are adjusted depending on the error measure N(i) — N(S(N(i))), the deviation between the network outputs N(i) and the result of applying the network to the system's output when applied to the network's outputs.)
Mehrotra does not disclose expressly detect a problem in a learning process of the neural network during training of the neural network.
Nugent, 0106; Basically, the connections in a connection network must be able to change in accordance with the feedback provided.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Nugent before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate a physical dynamic neural network machine of Nugent. Given the advantage of disclosing a number of properties and characteristics that translate a neural network design into a physical machine, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra and Nugent do not disclose expressly via a learning coach machine learning system; and correct the problem detected by the learning coach machine learning system.
Mandel discloses via a learning coach machine learning system; and correct the problem detected by the learning coach machine learning system. (Mandel, 0123; In a Determine Step 365, the second machine learning system is optionally used to determine if the second customer service inquiry should be provided to the first machine learning system for generation of at least a partial response to the second customer service inquiry. For example, once trained the second machine learning system may be used to determine acceptance and/or routing of additional customer service inquires.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Mandel before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate a learning system of Mandel. Given the advantage of a learning system is dynamic and adjustable and can be used . 


Claim(s) 24-25, 73-74 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra and Nugent as applied to claims 1-6, 8-12, 15-23, 26-27, 29-34, 37-39, 41-42, 53-58, 61, 64-65, 68, 71-72, 75-76 and 78 above, and further in view of Trenholm. (U. S. Patent Publication 20180017501, referred to as Trenholm) 

Claim 24
Mehrotra and Nugent do not disclose expressly wherein measuring the performance of the neural network on the second dataset is semi-supervised.
Trenholm discloses wherein measuring the performance of the neural network on the second dataset is semi-supervised. (Trenholm, 0094; ‘The neural networks may be trained using supervised or unsupervised (or semi-supervised) learning techniques, as described above.’ And ‘Once trained, or optionally during training, test data can be provided to the neural network to provide an output.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Trenholm before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate the concept of a testing dataset and partial user interaction with training of Trenholm. Given the advantage of obtaining information concerning the performance of a neural network and allowing a user to intervene if required, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 25
Mehrotra and Nugent do not disclose expressly wherein semi-supervised measuring of the performance of the neural network on the second dataset comprises labeling, by the computer system, data items of the second dataset via a recognizer machine learning system.
Trenholm discloses wherein semi-supervised measuring of the performance of the neural network on the second dataset comprises labeling, by the computer system, data items of the second dataset via a recognizer machine learning system. (Trenholm, 0094; ‘The neural networks may be trained using supervised or unsupervised (or semi-supervised) learning techniques, as described above.’ And ‘Once trained, or optionally during training, test data can be provided to the neural network to provide an output.’ EC: A neural network is a classifier and thus can be seen as a ‘recognizer machine learning system.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Trenholm before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate the concept of a testing dataset and partial user interaction with training of Trenholm. Given the advantage of obtaining information concerning the performance of a neural network and allowing a user to intervene if required, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 73

Trenholm discloses wherein the performance of the neural network on the second dataset comprises is measured via semi-supervised machine learning. (Trenholm, 0094; ‘The neural networks may be trained using supervised or unsupervised (or semi-supervised) learning techniques, as described above.’ And ‘Once trained, or optionally during training, test data can be provided to the neural network to provide an output.’)
It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Trenholm before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate the concept of a testing dataset and partial user interaction with training of Trenholm. Given the advantage of obtaining information concerning the performance of a neural network and allowing a user to intervene if required, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 74
Mehrotra and Nugent do not disclose expressly wherein the memory stores instructions that cause the computer system to label data items of the second dataset via a recognizer machine learning system for the semi-supervised machine learning.
Trenholm discloses wherein the memory stores instructions that cause the computer system to label data items of the second dataset via a recognizer Trenholm, 0094; ‘The neural networks may be trained using supervised or unsupervised (or semi-supervised) learning techniques, as described above.’ And ‘Once trained, or optionally during training, test data can be provided to the neural network to provide an output.’ EC: A neural network is a classifier and thus can be seen as a ‘recognizer machine learning system.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Trenholm before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate the concept of a testing dataset and partial user interaction with training of Trenholm. Given the advantage of obtaining information concerning the performance of a neural network and allowing a user to intervene if required, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Claim(s) 28 and 77 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra and Nugent as applied to claims 1-6, 8-12, 15-23, 26-27, 29-34, 37-39, 41-42, 53-58, 61, 64-65, 68, 71-72, 75-76 and 78 above, and further in view of Furber. (U. S. Patent 7457787, referred to as Furber) 

Claim 28
Mehrotra and Nugent do not disclose expressly wherein the activation functions comprise non-monotonic activations functions.
Furber, claim 4; A method for providing a neural network component having neuron activity levels that are controllable to be either monotonic or non-monotonic functions of neuron inputs,….) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Furber before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate nonmonotonic activations functions (sigmoid function) of Furber. Given the advantage of outing a result other a ‘1’ or a ‘0’, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 77
Mehrotra and Nugent do not disclose expressly wherein the activation functions comprise nonmonotonic activations functions.
Furber discloses wherein the activation functions comprise nonmonotonic activations functions. (Furber, claim 4; A method for providing a neural network component having neuron activity levels that are controllable to be either monotonic or non-monotonic functions of neuron inputs,….) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Furber before him before the effective filing date of the claimed invention, to modify Mehrotra and Nugent to incorporate nonmonotonic activations functions (sigmoid function) of Furber. Given the advantage of outing a result other a ‘1’ or a ‘0’, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Claim(s) 40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra and Nugent as applied to claims 1-6, 8-12, 15-23, 26-27, 29-34, 37-39, 41-42, 53-58, 61, 64-65, 68, 71-72, 75-76 and 78 above, and further in view of Mikhael. (U. S. Patent 7016885, referred to as Mikhael) 

Claim 40
Mehrotra and Nugent do not disclose expressly wherein detecting the problem in the learning process of the neural network comprises: detecting, by the computer system, whether a performance of the neural network with respect to the objective on the dataset is worse than the performance of an ensemble of neural networks by an amount exceeding a criterion.
Mikhael discloses wherein detecting the problem in the learning process of the neural network comprises: detecting, by the computer system, whether a performance of the neural network with respect to the objective on the dataset is worse than the performance of an ensemble of neural networks by an amount exceeding a criterion.(Mikhael, c2:45-c3:5; In the field of pattern recognition, the combination of an ensemble of neural networks has been to achieve image classification systems with higher performance in comparison with the best performance achievable employing a single neural network. EC: Here Mikhael discloses the concept of evaluating the performance of a single neural network to a plurality of neural networks.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Nugent and Mikhael before him before the effective filing date of the claimed invention, . 


Response to Arguments
3.	Applicant’s arguments filed on 2/19/2021 for claims 1-58, 61-65, 68, 71-78, 84, 92-93 have been fully considered but are not persuasive.

4.	Applicant’s argument:
Independent claims 1 and 53 were rejected under § 103 as being obvious over Mehrotra (“Elements of artificial neural networks”) and Nugent (Pub. 2005/0015351). Mehrotra discloses conventional back-propagation computations made when training a neural network (see Mehrotra § 3.3), but, as acknowledged in the Office Action, Mehrotra does not teach or suggest the step of “estimating ... an effect on the objective cause by the existence or non-existence of a direction connection” between two nodes of the network. Office Action p. 4. The Office Action is also correct that the Mehrotra does not disclose the step of “changing ... a structure of the neural network based at least in part on the estimate of the effect.” Id. Indeed, because Mehrotra does not disclose “estimating ... the effect,” Mehrotra cannot possibly disclose changing a “structure of the neural network based at least in part on the estimate of the effect.”

Examiner’s answer:
‘Estimating’ is not a term within the art and is mentioned at three different locations within claim 1. The first is plain ‘estimating.’ Since a neural network is a classifier, the examiner views this ‘estimating’ as a result training a neural network to produce a desired result. 
The second ‘estimating’ is in regards to a partial derivative. Mehrotra discloses this by The error E depends on wk, j  only through ok , i.e ., no other output term ok,, k' ≠ k contains wk, j . Hence, for the calculations that follow, it is sufficient to restrict attention k and then differentiate ok with respect to wk, j EC: The objective is the preferred outcome and thus the error is used to adjust the weights. This happened between the connection between one node to another node. (Mehrotra, p72)
The third estimating is on the ‘effect.’ Like the first ‘estimating, this is in regards to outcome of the training.

5.	Applicant’s argument:
Nugent notes that making connections in a physical neural network that can take on both positive and negative values is problematic. Id. [0109]-[0110], To overcome this problem, Nugent’s physical neural network uses two sets of connections—one set for positive connections and another for negative connections. Id. ]} [0111], An example of such dual sets of connections is shown in Nugent’s Figure 5, which shows two sets of actual connections: connections 512-520 for positive connections and connections 522-530 for negative connections. As such, there are no non-connections in Nugent and Nugent does not show changing the structure of the neural network based on the estimate of the effect on the objective for the neural network cause by the existence or non-existence of a direct connection.

Examiner’s answer:

Applicant assumption of ….

As such, there are no non-connections in Nugent and Nugent does not show changing the structure of the neural network based on the estimate of the effect on the objective for the neural network cause by the existence or non-existence of a direct connection.
Is incorrect. ‘In other words, either there can be a connection or no connection.’ [0110] ‘Negative connection’ and ‘no connection’ are two separate things. The alignment of the nanoparticles make the direct connection. If there is no signal, then there is no connection. 


	Claims 43-52 and 92-93 are objected

Conclusion – Final
7.	THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 


Correspondence Information
8.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.

	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);
or faxed to:
	(571) 272-3150 (for formal communications intended for entry.)
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121