Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action

1.	The Examiner acknowledges the applicant’s amendment filed 8/25/2022.  At this point claims 102-103, 105, 107-112, 114, 116-117, 119, 121-126, 128 and 130-131 are pending in the instant application and ready for examination by the Examiner.

2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 8/25/2022 has been entered.

Response to Arguments
3.	Applicant’s arguments filed on 8/25/2022 for claims 102-103, 105, 107-112, 114, 116-117, 119, 121-126, 128 and 130-131 have been fully considered but are not persuasive.

4.	Applicant’s argument:
In the independent claims, as amended, the “first node” is added to the ANN upon, and in response to, detection of slow learning by the ANN. In claims 102 and 116, the first node performs a “Boolean logic operation.” In claims 109 and 123, the activation value of the first node is computed “using an absolute value of a difference between” the inputs to the first node. The cited prior art does not teach adding a node to a neural network “upon, and in response to, identification of slow learning by the ANN ....”. The Office Action relied on Andoni for both (1) identifying slow learning by the ANN and (ii) “upon identification of slow learning, adding ... a first node to the first inner layer of the ANN.” Office Action at 4-5. Andoni, however, does not teach or suggest adding a node to an inner layer of an ANN upon, and in response to, identification of slow learning by the ANN, much less the special types of nodes recited in the independent claims.

Examiner’s answer:
The amended claim required new art. The references recite cascade correlation learning which discloses the addition the addition of a new node within the first layer is learning slows. 

5.	Applicant’s argument:
Andoni teaches that learning can be slow. Andoni §0002. Andoni also discloses a “mutation operation 170” that can add a node on a hidden (or inner) layer. /d. § 0082. However, Andoni does not disclose, teach or suggest adding a hidden layer node to a neural network “upon, and in response to, identification of slow learning” by the neural network. To the contrary, Andoni describes that the mutation operation is a “random or pseudo-random biological operator ....” Jd. There is no suggestion that a mutation in Andoni that adds a hidden layer node is in response to detection of slow learning by the neural network. Moreover, Andoni does not disclose the types of nodes described in the independent claims, much less adding those types of nodes when slow learning is detected. Indeed, none of the cited references teach or suggest adding the specific types of nodes recited in the independent claims in response to identification of slow learning.

Examiner’s answer:
Andoni is not cited for these independent claim limitations. 

6.	Applicant’s argument:
Relevant to claims 102 and 116, the Office Action is also incorrect that Mehrotra disclose a node that can implement a Boolean logic operation. Office Action at 3-4. Mehrotra discloses that a network having one hidden layer, which equates to three total layers (the input, hidden and output layers) can implement a binary function. See Mehrotra p. 90. That is not the same as a node that implements a Boolean logic operation.

Examiner’s answer:
Mehrotra is used for 4 different limitations. 
training, by a computer system, a nodal network over a series of multiple iterations, 
wherein the nodal network comprises an artificial neural network (ANN) that is trained iteratively through machine learning with a plurality of training data items (Mehrotra, p177; In the first phase, hidden layer nodes are used to divide the training set into clusters containing similar input patterns . The learning rule is identical to that of the simple competitive learning algorithm discussed in section 5 .1 .3, with the difference that the learning rate q (l) (t) is steadily reduced . Consequently, weights change much less in later iterations of this phase of the algorithm, stabilizing the first layer of weights such that the same hidden layer node continues to be the winner for similar input patterns.); 
the ANN comprises a plurality of layers, including an input layer, an output layer, and at least a first inner layer between the input and output layers (Mehrotra, p19, fig 1.14);  
wherein the first node has first binary input from a second node of the ANN, a second binary input from a third node of the ANN, and an activation function, such that an activation value of the first node is an output of a Boolean logic operation applied to the first and second binary inputs (Mehrotra, p90, p2 fig 1.1, pp2-3; ‘A neural network with one hidden layer can represent any binary function . To understand the reason for this, suppose f(xi , . . . , xn ) is a function in n variables such that xi 
    PNG
    media_image1.png
    19
    10
    media_image1.png
    Greyscale

    PNG
    media_image1.png
    19
    10
    media_image1.png
    Greyscale
 {0, 1 } for i = 1, . . . , n, and the output f(xi , . . . , xn ) 
    PNG
    media_image1.png
    19
    10
    media_image1.png
    Greyscale

    PNG
    media_image1.png
    19
    10
    media_image1.png
    Greyscale
 {0, 1}.’ And ‘The "AND" of two binary inputs is an elementary logical operation, implemented in hardware using an "AND gate ."’); and 
Boolean logic has variables or ‘truth’ or ‘false’ which are denoted as 1 or 0 (binary). 
The claimed artificial neural network is definition is open ended and by no means is limited to only 3 layers. 

7.	Applicant’s argument:
Relevant to claims 109 and 123, the Office Action acknowledges that neither Mehrotra, Andoni nor Lelescu disclose adding a node whose activation value is computed using an absolute value of the inputs to the node. Office Action at 14. The Office Action relies on Alaghi as “disclosing” that aspect of claims 109 and 123. Alaghi, however, does not disclose a node of an ANN whose activation value is computed using an absolute value of the inputs to the node. Instead, Alaghi disclose a logic circuit gate that can implement, for example, a XOR function. Additionally, the Office Action does not provide any explanation for why a person of ordinary skill in the art would modify Mehrotra’s neural networks to include logic circuit gates. For example, there is no explanation of how a person of ordinary skill in the art would employ the training techniques described in Mehrotra for neural networks if those neural networks included logic circuit gates.

Examiner’s answer:
There is no claim limitation stating, ‘adding a node whose activation value is computed using an absolute value of the inputs to the node.’ The limitations have the steps of a) determining if learning has slowed b) add a node to the first layer c) the output of the node is binary (Boolean) implementing a absolute value concept. The amended claims reflect the cascade-correlation learning architecture concept. Both references recite basic concepts of this learning method. Mehrotra is undergraduate level textbook as well as Alaghi. Therefore these are basic general knowledge reference of those skilled in the art.  

Claim Rejections - 35 USC § 103
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim(s) 102 and 130 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra in view of Fahlman in view of Phatak and further in view of Lelescu. (‘Elements of artificial neural networks’, referred to as Mehrotra; ‘The Cascade-Correlation Learning Architecture’, referred to as Fahlman; ‘Connectivity and Performance Tradeoffs in the Cascade Correlation Learning Architecture’, referred to as Phatak; U. S. Patent Publication 20170053382, referred to as Lelescu)

Claim 102
Mehrotra discloses method comprising: training, by a computer system, a nodal network over a series of multiple iterations, wherein the nodal network comprises an artificial neural network (ANN) that is trained iteratively through machine learning with a plurality of training data items (Mehrotra, p177; In the first phase, hidden layer nodes are used to divide the training set into clusters containing similar input patterns . The learning rule is identical to that of the simple competitive learning algorithm discussed in section 5 .1 .3, with the difference that the learning rate q (l) (t) is steadily reduced . Consequently, weights change much less in later iterations of this phase of the algorithm, stabilizing the first layer of weights such that the same hidden layer node continues to be the winner for similar input patterns.); the ANN comprises a plurality of layers, including an input layer, an output layer, and at least a first inner layer between the input and output layers (Mehrotra, p19, fig 1.14); …. wherein the first node has first binary input from a second node of the ANN, a second binary input from a third node of the ANN, and an activation function, such that an activation value of the first node is an output of a Boolean logic operation applied to the first and second binary inputs (Mehrotra, p90, p2 fig 1.1, pp2-3; ‘A neural network with one hidden layer can represent any binary function . To understand the reason for this, suppose f(xi , . . . , xn ) is a function in n variables such that xi 
    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale

    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale
 {0, 1 } for i = 1, . . . , n, and the output f(xi , . . . , xn ) 
    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale

    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale
 {0, 1}.’ And ‘The "AND" of two binary inputs is an elementary logical operation, implemented in hardware using an "AND gate ."’); and
Mehrotra does not disclose expressly and training the ANN comprises: identifying, by the computer system, slow learning by the ANN; upon, and in response to identification of slow learning by the ANN, adding, by the computer system, a first node …. resuming training, by the computer system of the ANN with the first node added, wherein identifying the slow training by the ANN comprises detecting, by the computer system, that, for each iteration in multiple successive iterations.
Fahlman discloses and training the ANN comprises: identifying, by the computer system, slow learning by the ANN; upon, and in response to identification of slow learning by the ANN, adding, by the computer system, a first node (Fahlman, p526; At some point, this training will approach an asymptote. When no significant error reduction has occurred after a certain number of training cycles (controlled by a "patience" parameter set by the operator), we run the network one last time over the entire training set to measure the error. If we are satisfied with the network's performance, we stop; if not, we attempt to reduce the residual errors further by adding a new hidden unit to the network. The unit-creation algorithm is described below. The new unit is added to the net, its input weights are frozen, and all the output weights are once again trained using quickprop. This cycle repeats until the error is acceptably small (or until we give up).’)…. resuming training, by the computer system of the ANN with the first node added, wherein identifying the slow training by the ANN comprises detecting, by the computer system, that, for each iteration in multiple successive iterations. (Fahlman, p526; At some point, this training will approach an asymptote. When no significant error reduction has occurred after a certain number of training cycles (controlled by a "patience" parameter set by the operator), we run the network one last time over the entire training set to measure the error. If we are satisfied with the network's performance, we stop; if not, we attempt to reduce the residual errors further by adding a new hidden unit to the network. The unit-creation algorithm is described below. The new unit is added to the net, its input weights are frozen, and all the output weights are once again trained using quickprop. This cycle repeats until the error is acceptably small (or until we give up).’) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Fahlman before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate an introduction of cascade correlation training of a neural network of Fahlman. Given the advantage of the addition of a node to a neural network to alter the answer space of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra and Fahlman do not disclose expressly to the first inner layer of the ANN.
Phatak discloses to the first inner layer of the ANN. (Phatak, p931; Step 2: Install the first hidden layer, adding one unit at a time. Hidden units in this layer receive connections only from the input units and are not connected to other hidden units. To install a hidden unit, it is connected to all of the network inputs and the connections are trained to maximize the correlation between its output and the residual error. Note that all the weights (input as well as output weights) associated with previously installed hidden units are held fixed when the input connections of the new unit are being trained. As a result, the new unit sees smaller residual error (at installation time) than previous units, because the previous units have already reduced the total error. The input-side weights of the new hidden unit remain frozen hereafter, as in the original Cascade Correlation algorithm. In the second phase of installing a unit, its output is connected to all the network output units. All the fan-in connections of the output-layer units (those emanating from previously installed hidden units as well as those connected to the hidden unit being currently installed), and their biases are then trained to minimize the error.EC: therefore, the new node is added to the first inner layer of the neural network.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman and Phatak before him before the effective filing date of the claimed invention, to modify Mehrotra and Fahlman to incorporate additional introduction information of cascade correlation training of a neural network of Phatak. Given the advantage of the addition of a node to a first layer of neural network to the biggest effect of the answer space of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra, Fahlman and Phatak do not disclose expressly a magnitude of a gradient for an objective for the ANN is less than a first threshold value, and a rate of change of the magnitude of the gradient over the successive multiple iterations is less than a second threshold value.
Lelescu discloses a magnitude of a gradient for an objective for the ANN is less than a first threshold value, and a rate of change of the magnitude of the gradient over the successive multiple iterations is less than a second threshold value. (Lelescu, 0307; Using the approaches described for the determination of the gradients in Eq. (2), this is performed iteratively in Eq. (1), until a stopping criterion is reached (e.g., a norm on the variation of the estimate with iteration number falls below a certain threshold, or the maximum number of iterations is attained), as shown in FIG. 9.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak and Lelescu before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman and Phatak to incorporate the employment gradients with a threshold for use of a decision engine of Lelescu. Given the advantage of using gradients for optimum results in regards to training is known within the art, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 130
Mehrotra discloses wherein training the nodal network comprises back-propagating partial derivatives through the ANN. (Mehrotra, p70-73; Hence, for the calculations that follow, it is sufficient to restrict attention to the partial derivative of E with respect to ok and then differentiate ok with respect to…  EC: This is under the training algorithm of back-propagation.)

Claim(s) 103 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra, Fahlman, Phatak and Lelescu as applied to claims 102, 130 above, and further in view of Wang. (‘New Efficient Design for XOR and XNOR Functions on the Transistor Level” referred to as Wang)

Claim 103
Mehrotra, Fahlman, Phatak and Lelescu do not disclose expressly possible values for each of the first and second binary inputs is zero and one; and possible values for the activation value of the first node are zero and one.
Wang discloses possible values for each of the first and second binary inputs is zero and one; and possible values for the activation value of the first node are zero and one. (Wang, table 1) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu and Wang before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak and Lelescu to incorporate binary input and output using XOR or XNOR properties, of Wang. Given the advantage of lower and faster computation costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 105 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra, Fahlman, Phatak and Lelescu as applied to claims 102, 130 above, and further in view of Wang. (‘The Parameterless Self-Organizing Map Algorithm” referred to as Berglund)

Claim 105
Mehrotra, Fahlman, Phatak and Lelescu do not disclose expressly wherein the ANN comprises a self-organizing partially ordered network. 
Berglund discloses wherein the ANN comprises a self-organizing partially ordered network. (Berglund, p310; Finally, the weight update functions of the different algorithms give us the last piece of the explanation. Consider a map that receives an input far outside the area it is currently mapping, after already being partly through its annealing and, therefore, partially ordered.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu and Berglund before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak and Lelescu to incorporate training a neural network of Berglund. Given the advantage of obtaining a usable classifier, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 107-109, 131 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra, Fahlman, Phatak and Lelescu as applied to claims 102, 130 above, and further in view of Alaghi. (‘Exploiting Correlation in Stochastic Circuit Design’ referred to as Alaghi)

Claim 107
Mehrotra, Fahlman, Phatak and Lelescu do not disclose expressly wherein the Boolean logic operation is an XOR operation such that the first node is configured to: have an activation value of 1 when the first and second binary inputs are unequal; and have an activation value of 0 when the first and second binary inputs are equal.
Alaghi discloses wherein the Boolean logic operation is an XOR operation such that the first node is configured to: have an activation value of 1 when the first and second binary inputs are unequal; and have an activation value of 0 when the first and second binary inputs are equal. (Alaghi, p2, fig 4; In other words, the output represents |px-py|. This is an exclusive or gate, XOR, A exclusive gate, XNOR would be 1- |px-py|.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu and Alaghi before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak and Lelescu to incorporate algebraic version of XOR and XNOR of Alaghi. Given the advantage of ease of understanding from a programmer view, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 108
Mehrotra, Fahlman, Phatak and Lelescu do not disclose expressly wherein the Boolean logic operation is an XNOR operation such that the first node is configured to: have an activation value of 1 when the first and second binary inputs are equal; and  have an activation value of 0 when the first and second binary inputs are unequal.
Alaghi discloses wherein the Boolean logic operation is an XNOR operation such that the first node is configured to: have an activation value of 1 when the first and second binary inputs are equal; and  have an activation value of 0 when the first and second binary inputs are unequal. (Alaghi, p2, fig 4; In other words, the output represents |px-py|. This is an exclusive or gate, XOR, A exclusive gate, XNOR would be 1- |px-py|.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu and Alaghi before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak and Lelescu to incorporate algebraic version of XOR and XNOR of Alaghi. Given the advantage of ease of understanding from a programmer view, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 109
Mehrotra discloses a method comprising training, by a computer system, a nodal network, wherein the nodal network comprises: an artificial neural network (ANN) that is trained iteratively, over a series of iterations, through machine learning with a plurality of training data items (Mehrotra, p177; In the first phase, hidden layer nodes are used to divide the training set into clusters containing similar input patterns . The learning rule is identical to that of the simple competitive learning algorithm discussed in section 5 .1 .3, with the difference that the learning rate q (l) (t) is steadily reduced . Consequently, weights change much less in later iterations of this phase of the algorithm, stabilizing the first layer of weights such that the same hidden layer node continues to be the winner for similar input patterns.); the ANN comprises a plurality of layers, including an input layer, an output layer, and at least a first inner layer between the input and output layers (Mehrotra, p19, fig 1.14); and …. wherein the first node has a first input x from a second node of the ANN and a second input y from a third node of the ANN (Mehrotra, p2 fig 1.1), wherein a range of values for each of the x and y is continuous from a starting value to an ending value, and the first node has an activation function (Mehrotra, 068; Each hidden node and output node applies a sigmoid function to its net input, shown in figure 3 .3. As discussed briefly in chapter 1, the main reasons motivating the use of an S-shaped sigmoidal function are that it is continuous,….),
Mehrotra does not disclose expressly and training the ANN comprises: identifying, by the computer system, slow learning by the ANN; upon, and in response to, identification of slow learning by the ANN, adding, by the computer system, a first node, …. resuming training, by the computer system of the ANN with the first node added, wherein identifying the slow training by the ANN comprises detecting, by the computer system, that, for each iteration in multiple successive iterations.
Fahlman discloses and training the ANN comprises: identifying, by the computer system, slow learning by the ANN; upon, and in response to, identification of slow learning by the ANN, adding, by the computer system, a first node (Fahlman, p526; At some point, this training will approach an asymptote. When no significant error reduction has occurred after a certain number of training cycles (controlled by a "patience" parameter set by the operator), we run the network one last time over the entire training set to measure the error. If we are satisfied with the network's performance, we stop; if not, we attempt to reduce the residual errors further by adding a new hidden unit to the network. The unit-creation algorithm is described below. The new unit is added to the net, its input weights are frozen, and all the output weights are once again trained using quickprop. This cycle repeats until the error is acceptably small (or until we give up).’), …. resuming training, by the computer system of the ANN with the first node added, wherein identifying the slow training by the ANN comprises detecting, by the computer system, that, for each iteration in multiple successive iterations. (Fahlman, p526; At some point, this training will approach an asymptote. When no significant error reduction has occurred after a certain number of training cycles (controlled by a "patience" parameter set by the operator), we run the network one last time over the entire training set to measure the error. If we are satisfied with the network's performance, we stop; if not, we attempt to reduce the residual errors further by adding a new hidden unit to the network. The unit-creation algorithm is described below. The new unit is added to the net, its input weights are frozen, and all the output weights are once again trained using quickprop. This cycle repeats until the error is acceptably small (or until we give up).’) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Fahlman before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate an introduction of cascade correlation training of a neural network of Fahlman. Given the advantage of the addition of a node to a neural network to alter the answer space of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra and Fahlman do not disclose expressly to the first inner layer of the ANN.
Phatak discloses to the first inner layer of the ANN. (Phatak, p931; Step 2: Install the first hidden layer, adding one unit at a time. Hidden units in this layer receive connections only from the input units and are not connected to other hidden units. To install a hidden unit, it is connected to all of the network inputs and the connections are trained to maximize the correlation between its output and the residual error. Note that all the weights (input as well as output weights) associated with previously installed hidden units are held fixed when the input connections of the new unit are being trained. As a result, the new unit sees smaller residual error (at installation time) than previous units, because the previous units have already reduced the total error. The input-side weights of the new hidden unit remain frozen hereafter, as in the original Cascade Correlation algorithm. In the second phase of installing a unit, its output is connected to all the network output units. All the fan-in connections of the output-layer units (those emanating from previously installed hidden units as well as those connected to the hidden unit being currently installed), and their biases are then trained to minimize the error.EC: therefore, the new node is added to the first inner layer of the neural network.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman and Phatak before him before the effective filing date of the claimed invention, to modify Mehrotra and Fahlman to incorporate additional introduction information of cascade correlation training of a neural network of Phatak. Given the advantage of the addition of a node to a first layer of neural network to the biggest effect of the answer space of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra, Fahlman and Phatak do not disclose expressly a magnitude of a gradient for an objective for the ANN is less than a first threshold value, and a change of the magnitudes of the gradients over the successive multiple iterations is less than a second threshold value. 
Lelescu discloses a magnitude of a gradient for an objective for the ANN is less than a first threshold value, and a change of the magnitudes of the gradients over the successive multiple iterations is less than a second threshold value. (Lelescu, 0307; Using the approaches described for the determination of the gradients in Eq. (2), this is performed iteratively in Eq. (1), until a stopping criterion is reached (e.g., a norm on the variation of the estimate with iteration number falls below a certain threshold, or the maximum number of iterations is attained), as shown in FIG. 9.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak and Lelescu before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman and Phatak to incorporate the employment gradients with a threshold for use of a decision engine of Lelescu. Given the advantage of using gradients for optimum results in regards to training is known within the art, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra, Fahlman, Phatak and Lelescu do not disclose expressly such that an activation value of the first node is computed, by the computer system, using an absolute value of a difference between x and y.
Alaghi discloses such that an activation value of the first node is computed, by the computer system, using an absolute value of a difference between x and y. (Alaghi, p2, fig 4; In other words, the output represents |px-py|. This is an exclusive or gate, XOR, A exclusive gate, XNOR would be 1- |px-py|.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu and Alaghi before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak and Lelescu to incorporate algebraic version of XOR and XNOR of Alaghi. Given the advantage of ease of understanding from a programmer view, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 131
Mehrotra discloses wherein training the nodal network comprises back-propagating partial derivatives through the ANN. (Mehrotra, p70-73; Hence, for the calculations that follow, it is sufficient to restrict attention to the partial derivative of E with respect to ok and then differentiate ok with respect to…  EC: This is under the training algorithm of back-propagation.)

Claim(s) 110-112 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra, Fahlman, Phatak, Lelescu and Alaghi as applied to claims 107-109, 131 above, and further in view of Wang. (‘New Efficient Design for XOR and XNOR Functions on the Transistor Level’ referred to as Wang)

Claim 110
Mehrotra, Fahlman, Phatak, Lelescu and Alaghi do not disclose expressly the starting value for the range of values for each of x and y is zero; the ending value for the range of values for each of x and y is one; and a range of values for the activation value of the first node is continuous from zero and one.
Wang discloses the starting value for the range of values for each of x and y is zero; the ending value for the range of values for each of x and y is one (Wang, table 1; This is the first row of the truth table with the result under the XNOR column.); and a range of values for the activation value of the first node is continuous from zero and one. (Wang, table 1; The range of inputs is 0 to 1. Therefore, the starting value is 0 and the ending value is 1.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Wang before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu and Alaghi to incorporate binary input and output using XOR or XNOR properties, , of Wang. Given the advantage of lower and faster computation costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 111
Mehrotra, Fahlman, Phatak, Lelescu and Alaghi do not disclose expressly wherein the activation value of the first node is computed as (1 — |x — y|).
Wang discloses wherein the activation value of the first node is computed as (1 — |x — y|). (Wang, table 1; Under the XNOR column.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Wang before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu and Alaghi to incorporate binary input and output using XOR or XNOR properties, , of Wang. Given the advantage of lower and faster computation costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 112
Mehrotra, Fahlman, Phatak, Lelescu and Alaghi do not disclose expressly wherein the activation value of the first node is computed as |x — y|.
Wang discloses wherein the activation value of the first node is computed as |x — y|. (Wang, table 1; Under the XOR column.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Wang before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu and Alaghi to incorporate binary input and output using XOR or XNOR properties, , of Wang. Given the advantage of lower and faster computation costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 114 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra, Fahlman, Phatak, Lelescu and Alaghi as applied to claims 107-109, 131 above, and further in view of Berglund. (‘The Parameterless Self-Organizing Map Algorithm” referred to as Berglund)

Claim 114
Mehrotra, Fahlman, Phatak, Lelescu and Alaghi do not disclose expressly wherein the nodal network comprises a self-organizing partially ordered network.
Berglund discloses wherein the nodal network comprises a self-organizing partially ordered network. (Berglund, p310; Finally, the weight update functions of the different algorithms give us the last piece of the explanation. Consider a map that receives an input far outside the area it is currently mapping, after already being partly through its annealing and, therefore, partially ordered.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Berglund before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Alaghi and Lelescu to incorporate training a neural network of Berglund. Given the advantage of obtaining a usable classifier, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 116, 121-123 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra, Fahlman, Phatak, Lelescu and Alaghi as applied to claims 107-109, 131 above, and further in view of Homayoun. (U. S. Patent Publication 20170103236 referred to as Homayoun)

Claim 116
Mehrotra discloses a computer system comprising: …. to train a nodal network, wherein the nodal network comprises an artificial neural network (ANN) that is trained iteratively, over a series of iterations, through machine learning with a plurality of training data items (Mehrotra, p177; In the first phase, hidden layer nodes are used to divide the training set into clusters containing similar input patterns . The learning rule is identical to that of the simple competitive learning algorithm discussed in section 5 .1 .3, with the difference that the learning rate q (l) (t) is steadily reduced . Consequently, weights change much less in later iterations of this phase of the algorithm, stabilizing the first layer of weights such that the same hidden layer node continues to be the winner for similar input patterns.); the ANN comprises a plurality of layers, including an input layer, an output layer, and at least a first inner layer between the input and output layers (Mehrotra, p19, fig 1.14): and…. wherein; and the first node has first binary input from a second node of the ANN, a second binary input from a third node of the ANN, and an activation function, such that an activation value of the first node is an output of a Boolean logic operation applied to the first and second binary inputs. (Mehrotra, p90, p2 fig 1.1, pp2-3; ‘A neural network with one hidden layer can represent any binary function . To understand the reason for this, suppose f(xi , . . . , xn ) is a function in n variables such that xi 
    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale

    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale
 {0, 1 } for i = 1, . . . , n, and the output f(xi , . . . , xn ) 
    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale

    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale
 {0, 1}.’ And ‘The "AND" of two binary inputs is an elementary logical operation, implemented in hardware using an "AND gate ." If the inputs to the AND gate are x i 
    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale

    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale
 {0, 1} and x2 
    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale

    PNG
    media_image2.png
    19
    10
    media_image2.png
    Greyscale
 {0, 1), the desired output is 1 if x i = x2 = 1, and 0 otherwise.’)
Mehrotra does not disclose expressly identifying slow learning by the ANN; upon, and in response to, identification of the slow learning by the ANN, adding a first node,…. resuming training of the ANN with the first node added, wherein the memory stores software that, when executed by the one or processor cores, cause the one or more processor cores to identify the slow training by the ANN by detecting that, for each iteration in multiple successive iterations. 
Fahlman discloses identifying slow learning by the ANN; upon, and in response to, identification of the slow learning by the ANN, adding a first node (Fahlman, p526; At some point, this training will approach an asymptote. When no significant error reduction has occurred after a certain number of training cycles (controlled by a "patience" parameter set by the operator), we run the network one last time over the entire training set to measure the error. If we are satisfied with the network's performance, we stop; if not, we attempt to reduce the residual errors further by adding a new hidden unit to the network. The unit-creation algorithm is described below. The new unit is added to the net, its input weights are frozen, and all the output weights are once again trained using quickprop. This cycle repeats until the error is acceptably small (or until we give up).’),…. resuming training of the ANN with the first node added, wherein the memory stores software that, when executed by the one or processor cores, cause the one or more processor cores to identify the slow training by the ANN by detecting that, for each iteration in multiple successive iterations. (Fahlman, p526; At some point, this training will approach an asymptote. When no significant error reduction has occurred after a certain number of training cycles (controlled by a "patience" parameter set by the operator), we run the network one last time over the entire training set to measure the error. If we are satisfied with the network's performance, we stop; if not, we attempt to reduce the residual errors further by adding a new hidden unit to the network. The unit-creation algorithm is described below. The new unit is added to the net, its input weights are frozen, and all the output weights are once again trained using quickprop. This cycle repeats until the error is acceptably small (or until we give up).’) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Fahlman before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate an introduction of cascade correlation training of a neural network of Fahlman. Given the advantage of the addition of a node to a neural network to alter the answer space of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra and Fahlman do not disclose expressly to the first inner layer of the ANN. 
Phatak discloses to the first inner layer of the ANN. (Phatak, p931; Step 2: Install the first hidden layer, adding one unit at a time. Hidden units in this layer receive connections only from the input units and are not connected to other hidden units. To install a hidden unit, it is connected to all of the network inputs and the connections are trained to maximize the correlation between its output and the residual error. Note that all the weights (input as well as output weights) associated with previously installed hidden units are held fixed when the input connections of the new unit are being trained. As a result, the new unit sees smaller residual error (at installation time) than previous units, because the previous units have already reduced the total error. The input-side weights of the new hidden unit remain frozen hereafter, as in the original Cascade Correlation algorithm. In the second phase of installing a unit, its output is connected to all the network output units. All the fan-in connections of the output-layer units (those emanating from previously installed hidden units as well as those connected to the hidden unit being currently installed), and their biases are then trained to minimize the error.EC: therefore, the new node is added to the first inner layer of the neural network.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman and Phatak before him before the effective filing date of the claimed invention, to modify Mehrotra and Fahlman to incorporate additional introduction information of cascade correlation training of a neural network of Phatak. Given the advantage of the addition of a node to a first layer of neural network to the biggest effect of the answer space of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra, Fahlman and Phatak do not disclose expressly a magnitude of a gradient for an objective for the ANN is less than a first threshold value, and a rate of change of the magnitude of the gradient over the successive multiple iterations is less than a second threshold value. 
Lelescu discloses a magnitude of a gradient for an objective for the ANN is less than a first threshold value, and a rate of change of the magnitude of the gradient over the successive multiple iterations is less than a second threshold value. (Lelescu, 0307; Using the approaches described for the determination of the gradients in Eq. (2), this is performed iteratively in Eq. (1), until a stopping criterion is reached (e.g., a norm on the variation of the estimate with iteration number falls below a certain threshold, or the maximum number of iterations is attained), as shown in FIG. 9.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak and Lelescu before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman and Phatak to incorporate the employment gradients with a threshold for use of a decision engine of Lelescu. Given the advantage of using gradients for optimum results in regards to training is known within the art, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra, Fahlman, Phatak, Lelescu and Alaghi do not disclose expressly one or more processor cores; and a memory in communication with the one or more processor cores, wherein the memory stores software that, when executed by the one or more processor cores, cause the one or more processor cores ….and the memory stores software that, when executed by the one or more processor cores, causes the one or more processor cores to train the ANN by.
Homayoun discloses one or more processor cores; and a memory in communication with the one or more processor cores, wherein the memory stores software that, when executed by the one or more processor cores, cause the one or more processor cores (Homayoun, 0104; Further, one skilled in the art will appreciate that the systems and methods disclosed herein can be implemented via a general-purpose computing device in the form of a computer 1501. The components of the computer 1501 can comprise, but are not limited to, one or more processors 1503, a system memory 1512, and a system bus 1513 that couples various system components including the one or more processors 1503 to the system memory 1512. The system can utilize parallel computing.)….and the memory stores software that, when executed by the one or more processor cores, causes the one or more processor cores to train the ANN by. (Homayoun, 0148; For purposes of illustration, application programs and other executable program components such as the operating system 1505 are illustrated herein as discrete blocks, although it is recognized that such programs and components reside at various times in different storage components of the computing device 1501, and are executed by the one or more processors 1503 of the computer. An implementation of the security software 1506 can be stored on or transmitted across some form of computer readable media. Any of the disclosed methods can be performed by computer readable instructions embodied on computer readable media.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu and Alaghi to incorporate generic computer hardware and the concept to starting building a model with nothing established of Homayoun. Given the advantage of being able to employ the invention and being able to not having limiting factors using an established template , one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 121
Mehrotra, Fahlman, Phatak and Lelescu do not disclose expressly wherein the Boolean logic operation is an XOR operation such that the first node is configured to: have an activation value of 1 when the first and second binary inputs are unequal; and have an activation value of 0 when the first and second binary inputs are equal.
Alaghi discloses wherein the Boolean logic operation is an XOR operation such that the first node is configured to: have an activation value of 1 when the first and second binary inputs are unequal; and have an activation value of 0 when the first and second binary inputs are equal. (Alaghi, p2, fig 4; In other words, the output represents |px-py|. This is an exclusive or gate, XOR, A exclusive gate, XNOR would be 1- |px-py|.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu and Alaghi before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak and Lelescu to incorporate algebraic version of XOR and XNOR of Alaghi. Given the advantage of ease of understanding from a programmer view, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 122
Mehrotra, Fahlman, Phatak and Lelescu do not disclose expressly wherein the Boolean logic operation is an XNOR operation such that the first node is configured to: have an activation value of 1 when the first and second binary inputs are equal; and have an activation value of 0 when the first and second binary inputs are unequal.
Alaghi discloses wherein the Boolean logic operation is an XNOR operation such that the first node is configured to: have an activation value of 1 when the first and second binary inputs are equal; and have an activation value of 0 when the first and second binary inputs are unequal. (Alaghi, p2, fig 4; In other words, the output represents |px-py|. This is an exclusive or gate, XOR, A exclusive gate, XNOR would be 1- |px-py|.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu and Alaghi before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak and Lelescu to incorporate algebraic version of XOR and XNOR of Alaghi. Given the advantage of ease of understanding from a programmer view, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 123
Mehrotra discloses to train a nodal network, wherein the nodal network comprises: an artificial neural network (ANN) that is trained iteratively, over a series of multiple iterations, through machine learning with a plurality of training data items (Mehrotra, p177; In the first phase, hidden layer nodes are used to divide the training set into clusters containing similar input patterns . The learning rule is identical to that of the simple competitive learning algorithm discussed in section 5 .1 .3, with the difference that the learning rate q (l) (t) is steadily reduced . Consequently, weights change much less in later iterations of this phase of the algorithm, stabilizing the first layer of weights such that the same hidden layer node continues to be the winner for similar input patterns.);  the ANN comprises a plurality of layers, including an input layer, an output layers, and at least a first inner layer between the input and output layers (Mehrotra, p19, fig 1.14); and…. wherein the first node that has first input x from a second node of the ANN and a second input y from a third node of the ANN (Mehrotra, p2 fig 1.1), wherein a range of values for each of the inputs x and y is continuous from a starting value to an ending value, and the first node has an activation function. (Mehrotra, 068; Each hidden node and output node applies a sigmoid function to its net input, shown in figure 3 .3. As discussed briefly in chapter 1, the main reasons motivating the use of an S-shaped sigmoidal function are that it is continuous,….)
Mehrotra does not disclose expressly identifying slow learning by the ANN; upon, and in response to, identification of the slow learning by the ANN, adding a first node …. resuming training of the ANN with the first node added, wherein the memory stores software that, when executed by the one or processor cores, cause the one or more processor cores to identify the slow training by the ANN by detecting that, for each iteration in multiple successive iterations. 
Fahlman discloses identifying slow learning by the ANN; upon, and in response to, identification of the slow learning by the ANN, adding a first node (Fahlman, p526; At some point, this training will approach an asymptote. When no significant error reduction has occurred after a certain number of training cycles (controlled by a "patience" parameter set by the operator), we run the network one last time over the entire training set to measure the error. If we are satisfied with the network's performance, we stop; if not, we attempt to reduce the residual errors further by adding a new hidden unit to the network. The unit-creation algorithm is described below. The new unit is added to the net, its input weights are frozen, and all the output weights are once again trained using quickprop. This cycle repeats until the error is acceptably small (or until we give up).’)…. resuming training of the ANN with the first node added, wherein the memory stores software that, when executed by the one or processor cores, cause the one or more processor cores to identify the slow training by the ANN by detecting that, for each iteration in multiple successive iterations. (Fahlman, p526; At some point, this training will approach an asymptote. When no significant error reduction has occurred after a certain number of training cycles (controlled by a "patience" parameter set by the operator), we run the network one last time over the entire training set to measure the error. If we are satisfied with the network's performance, we stop; if not, we attempt to reduce the residual errors further by adding a new hidden unit to the network. The unit-creation algorithm is described below. The new unit is added to the net, its input weights are frozen, and all the output weights are once again trained using quickprop. This cycle repeats until the error is acceptably small (or until we give up).’) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra and Fahlman before him before the effective filing date of the claimed invention, to modify Mehrotra to incorporate an introduction of cascade correlation training of a neural network of Fahlman. Given the advantage of the addition of a node to a neural network to alter the answer space of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra and Fahlman do not disclose expressly to the first inner layer of the ANN.
Phatak discloses to the first inner layer of the ANN. (Phatak, p931; Step 2: Install the first hidden layer, adding one unit at a time. Hidden units in this layer receive connections only from the input units and are not connected to other hidden units. To install a hidden unit, it is connected to all of the network inputs and the connections are trained to maximize the correlation between its output and the residual error. Note that all the weights (input as well as output weights) associated with previously installed hidden units are held fixed when the input connections of the new unit are being trained. As a result, the new unit sees smaller residual error (at installation time) than previous units, because the previous units have already reduced the total error. The input-side weights of the new hidden unit remain frozen hereafter, as in the original Cascade Correlation algorithm. In the second phase of installing a unit, its output is connected to all the network output units. All the fan-in connections of the output-layer units (those emanating from previously installed hidden units as well as those connected to the hidden unit being currently installed), and their biases are then trained to minimize the error.EC: therefore, the new node is added to the first inner layer of the neural network.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman and Phatak before him before the effective filing date of the claimed invention, to modify Mehrotra and Fahlman to incorporate additional introduction information of cascade correlation training of a neural network of Phatak. Given the advantage of the addition of a node to a first layer of neural network to the biggest effect of the answer space of the neural network, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra, Fahlman and Phatak do not disclose expressly a magnitude of a gradient for an objective for the ANN is less than a first threshold value, and a rate of change of the magnitude of the gradient over the successive multiple iterations is less than a second threshold value. 
Lelescu discloses a magnitude of a gradient for an objective for the ANN is less than a first threshold value, and a rate of change of the magnitude of the gradient over the successive multiple iterations is less than a second threshold value. (Lelescu, 0307; Using the approaches described for the determination of the gradients in Eq. (2), this is performed iteratively in Eq. (1), until a stopping criterion is reached (e.g., a norm on the variation of the estimate with iteration number falls below a certain threshold, or the maximum number of iterations is attained), as shown in FIG. 9.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak and Lelescu before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman and Phatak to incorporate the employment gradients with a threshold for use of a decision engine of Lelescu. Given the advantage of using gradients for optimum results in regards to training is known within the art, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Mehrotra, Fahlman, Phatak and Lelescu do not disclose expressly such that an activation value of the first node is computed using an absolute value of a difference between x and y.
Alaghi discloses such that an activation value of the first node is computed using an absolute value of a difference between x and y. (Alaghi, p2, fig 4; In other words, the output represents |px-py|. This is an exclusive or gate, XOR, A exclusive gate, XNOR would be 1- |px-py|.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu and Alaghi before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak and Lelescu to incorporate algebraic version of XOR and XNOR of Alaghi. Given the advantage of ease of understanding from a programmer view, one having ordinary skill in the art would have been motivated to make this obvious modification.
Mehrotra, Fahlman, Phatak, Lelescu and Alaghi do not disclose expressly one or more processor cores; and a memory in communication with the one or more processor cores, wherein the memory stores software that, when executed by the one or more processor cores, cause the one or more processor cores …. the memory stores software that, when executed by the one or more processor cores, causes the one or more processor cores to train the ANN by.
Homayoun discloses a computer system comprising:  one or more processor cores; and a memory in communication with the one or more processor cores, wherein the memory stores software that, when executed by the one or more processor cores, cause the one or more processor cores (Homayoun, 0104; Further, one skilled in the art will appreciate that the systems and methods disclosed herein can be implemented via a general-purpose computing device in the form of a computer 1501. The components of the computer 1501 can comprise, but are not limited to, one or more processors 1503, a system memory 1512, and a system bus 1513 that couples various system components including the one or more processors 1503 to the system memory 1512. The system can utilize parallel computing.)…. the memory stores software that, when executed by the one or more processor cores, causes the one or more processor cores to train the ANN by. (Homayoun, 0148; For purposes of illustration, application programs and other executable program components such as the operating system 1505 are illustrated herein as discrete blocks, although it is recognized that such programs and components reside at various times in different storage components of the computing device 1501, and are executed by the one or more processors 1503 of the computer. An implementation of the security software 1506 can be stored on or transmitted across some form of computer readable media. Any of the disclosed methods can be performed by computer readable instructions embodied on computer readable media.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu and Alaghi to incorporate generic computer hardware and the concept to starting building a model with nothing established of Homayoun. Given the advantage of being able to employ the invention and being able to not having limiting factors using an established template , one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 117, 124-126 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun as applied to claims 116, 121-123  above, and further in view of Wang (‘New Efficient Design for XOR and XNOR Functions on the Transistor Level” referred to as Wang)

Claim 117
Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun do not disclose expressly possible values for each of the first and second binary inputs is zero and one; and possible values for the activation value of the first node are zero and one.
Wang discloses possible values for each of the first and second binary inputs is zero and one; and possible values for the activation value of the first node are zero and one. (Wang, table 1) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi, Homayoun and Wang before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun to incorporate binary input and output using XOR or XNOR properties, of Wang. Given the advantage of lower and faster computation costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 124
Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun do not disclose expressly the starting value for the range of values for each of x and y is zero; the ending value for the range of values for each of x and y is one; and a range of values for the activation value of the first node is continuous from zero and one.
Wang discloses the starting value for the range of values for each of x and y is zero; the ending value for the range of values for each of x and y is one (Wang, table 1; This is the first row of the truth table with the result under the XNOR column.); and a range of values for the activation value of the first node is continuous from zero and one. (Wang, table 1; The output of an XNOR is binary.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi, Homayoun and Wang before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun to incorporate binary input and output using XOR or XNOR properties, of Wang. Given the advantage of lower and faster computation costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 125
Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun do not disclose expressly wherein the activation value of the first node is computed as (1 — |x — y|).
Wang discloses wherein the activation value of the first node is computed as (1 — |x — y|). (Wang, table 1; The is under the XNOR column.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi, Homayoun and Wang before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun to incorporate binary input and output using XOR or XNOR properties, of Wang. Given the advantage of lower and faster computation costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 126
Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun do not disclose expressly wherein the activation value of the first node is computed as |x — y|.
Wang discloses wherein the activation value of the first node is computed as |x — y|. (Wang, table 1; The is under the XOR column.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi, Homayoun and Wang before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun to incorporate binary input and output using XOR or XNOR properties, of Wang. Given the advantage of lower and faster computation costs, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 119 and 128 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun as applied to claims 116, 121-123  above, and further in view of Berhlund. (‘The Parameterless Self-Organizing Map Algorithm” referred to as Berglund)

Claim 119
Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun do not disclose expressly wherein the ANN comprises a self organizing partially ordered network. 
Berglund discloses wherein the ANN comprises a self organizing partially ordered network. (Berglund, p310; Finally, the weight update functions of the different algorithms give us the last piece of the explanation. Consider a map that receives an input far outside the area it is currently mapping, after already being partly through its annealing and, therefore, partially ordered.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi, Homayoun and Berglund before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun to incorporate training a neural network of Berglund. Given the advantage of obtaining a usable classifier, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 128
Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun do not disclose expressly wherein the ANN comprises a self organizing partially ordered network.
Berglund discloses wherein the ANN comprises a self organizing partially ordered network. (Berglund, p310; Finally, the weight update functions of the different algorithms give us the last piece of the explanation. Consider a map that receives an input far outside the area it is currently mapping, after already being partly through its annealing and, therefore, partially ordered.) It would have been obvious to one having ordinary skill in the art, having the teachings of Mehrotra, Fahlman, Phatak, Lelescu, Alaghi, Homayoun and Berglund before him before the effective filing date of the claimed invention, to modify Mehrotra, Fahlman, Phatak, Lelescu, Alaghi and Homayoun to incorporate training a neural network of Berglund. Given the advantage of obtaining a usable classifier, one having ordinary skill in the art would have been motivated to make this obvious modification.

9.	Claims 102-103, 105, 107-112, 114, 116-117, 119, 121-126, 128, 130-131 are rejected.
	
Conclusion	
10.	The prior art of record and not relied upon is considered pertinent to the applicant’s disclosure.
	-Search terms: Cascade Correlation
	-U. S. Patent 5745649: Lubensky

Correspondence Information
11.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Michael Huntley can be reached at (303) 297-4307.  .  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);
or faxed to:
	(571) 272-3150 (for formal communications intended for entry.)
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129