DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 1/3/2022 has been entered. 
Claims 1, 8, and 15 have been amended. Claims 1-20 have been examined.

Response to Arguments/Amendments
The amendments of claim 8 have overcome the prior rejection under 35 USC § 112. The rejection of claim 8 has been withdrawn.
Applicant’s arguments, see p. 15, filed 11/23/2021, with respect to the rejection(s) of claim(s) 1, 8, and 15 under 35 USC § 103, have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of U.S. Patent Application Publication 5,113,483 to Keeler et al.
Applicant's remaining arguments filed 11/23/2021 have been fully considered but they are not persuasive. 
On p. 14 filed 11/23/2021, Applicant argues that Eberhart's slope vectors "do not appear to work with the alleged activation function in Esterline," and renders Esterline unsuitable for its intended purpose. Applicant fails to explain why Eberhart's slope vectors would fail to work with Esterline. Both Eberhart and Esterline utilize slope/scale variables for use with activation functions. It is not clear why using Eberhart's node-specific slope factors would fail to work with Esterline's activation function.
On p. 14 filed 11/23/2021, Applicant argues that the cited art fails to teach scale nodes connected to neurons. However, the depiction of a variable node physically connected to a neuron is merely an abstraction representing a data connection to a function. Esterline Fig. 8 and ¶ 0102 teaches that neurons may include connections to a weight factor, a bias "b" and a bias "a." ¶ 0103 describes this mathematically and includes the scale/slope variable alpha. These variables are equivalent to nodes that are connected to a neuron. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 8-10, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over United States Patent Application Publication 2014/0337261 by Esterline (“Esterline”) in view of U.S. Patent 5,687,291 to Smyth et al. (“Smyth”), U.S. Patent 6,516,309 to Eberhart et al. (“Eberhart”), and U.S. Patent Application Publication 5,113,483 to Keeler et al. (“Keeler”)

	In regard to claim 1, Esterline discloses:
1. A computer-implemented method for training a neural network (NN) comprising a plurality of layers including hidden layers having at least one hidden layer and a subsequent hidden layer, wherein each of the plurality of layers includes a respective set of nodes, the method comprising: See Esterline, Fig. 1, depicting a NN with a plurality of layers. Also see ¶ 0077, e.g. “In some embodiments, there are multiple hidden layers.” Also see ¶ 0092, e.g. “the network may be trained using supervised learning or unsupervised learning.” 
determining, for a current layer of the NN for a current iteration of the training, 
a set of inputs, the set of inputs including a set of training inputs or a set of activation results associated with the respective set of nodes of a prior layer of the NN, See Esterline, Figs. 2 and 3A, depicting inputs. Also, see ¶ 0079, e.g. “inputs.”
a bias parameter, See Esterline, Fig. 3A, depicting bias “b” along with ¶ 0022, e.g. “bias input.”
a scale parameter, and See Esterline, ¶ 0104, e.g. “the values of one or more of the variables α, … are iteratively calculated during the training phase of the ANN.” 
a respective set of weights to be applied to the set of inputs, See Esterline, Fig. 3A, depicting input weights, along with ¶ 0010. … the bias parameter, and See Esterline, Fig. 7, depicting weights applied to a bias parameter. Also see ¶ 0091. … the scale parameter See Esterline, equation 3 along with ¶ 0104, e.g. “the values of … ω1, … are iteratively calculated during the training phase of the ANN.” … for each node of the current layer of the NN;  See at least ¶ 0096, e.g. “each neuron can be solved”
determining, for a particular node of the current layer during the current iteration of the training, a combined input based at least in part on the set of inputs, the respective set of weights associated with the particular node, and the bias parameter; See Esterline, Fig. 3A, depicting combined input Z.
determining a weighted scale parameter by applying a corresponding weight from the respective set of weights to the scale parameter in preparation for execution of an activation function, wherein the corresponding weight is learned from one or more prior iterations of the training, See Esterline, p. 8, equation 3, which shows a scale factor 1. Also see ¶ 0104, e.g. “the values of one or more of the variables α, ω1, and b are iteratively calculated during the training phase of the ANN.” Note that in the equation, the weighted scale parameter must be determined/calculated prior to, i.e. in preparation for, subsequent execution of the activation function which utilizes the weighted scale parameter. Without first determining the weighted scale parameter, the activation function cannot be executed.
executing, for the current iteration of the training, an activation function for the particular node, using the combined input and the weighted scale parameter to generate an activation result for the particular node; See Esterline, Fig. 3A, depicting an activation function that utilizes a summation of weighted inputs and bias b. Also see ¶ 0080, e.g. “As is shown in FIG. 3A, input vectors are scaled with individual synaptic weights ω1 through ωn, and fed into a linear summation module. The inputs are linearly summed together along with a bias term, `b`. In FIG. 3A, the resulting linear sum is denoted as `z`. The linear sum is then fed through an activation function to produce an output `y.`” Also see Table A on p. 6, describing a uni-polar sigmoid, e.g.:
                
                    y
                    =
                     
                    
                        
                            1
                        
                        
                            1
                            +
                             
                            
                                
                                    e
                                
                                
                                    -
                                    α
                                    (
                                    z
                                    )
                                
                            
                        
                    
                
            
Also see ¶ 0103-0107e.g. “the values of one or more of the variables α, ω1, and b are iteratively calculated.” … “By changing the sign of α, the sign of the slope of the activation function output may also be changed (e.g., from positive slope to negative slope). In some embodiments, the slope of the sigmoid is set at either 1 or -1 to obtain the direction of slope desired,” Also see equation (3) on p. 8 at ¶ 0103, depicting an activation function using a weighted scale parameter, e.g.:
                
                    Ø
                    =
                     
                    
                        
                            1
                        
                        
                            1
                            +
                             
                            
                                
                                    e
                                
                                
                                    -
                                    α
                                    (
                                    ω
                                    x
                                    +
                                    b
                                    )
                                
                            
                        
                    
                
            
determining whether the current layer is the final layer; and outputting, based at least in part on determining whether the current layer is the final layer, the activation result as a classifier output of the NN or providing the activation result as input to the next layer of the NN. See Esterline, ¶ 0078, e.g. “In the system of FIG. 1, the far-right layer is an output layer. A single sweep through the network from left to right results in the assignment of a value to each output node.” Also see ¶ 0092, e.g. “the output classifies the input signal to the desired output.”
wherein the NN comprises an input layer having first nodes including a first bias parameter node and a first scale parameter node, the at least one hidden layer having second nodes including a second bias parameter node and a second scale parameter node, the subsequent hidden layer having third nodes including a third bias parameter node and a third scale parameter node, and the final layer having final nodes, See Esterline, Fig. 1, depicting a neural network. Also see ¶ 0077-0078, discussing input and hidden layers. Also see Fig. 3A, depicting an input layer having a bias parameter b, as well as Fig. 7, depicting first and second layers of a neural network, where the inclusion of weight parameters applied to the bias parameter can be conceptualized as first and second bias nodes. That is a first bias node corresponds to the product bω02 and a second bias node corresponds to bω02. Esterline also includes the notion of a scale factor. See p. 8, equation 3 in ¶ 0103, which shows an activation function including a scale factor α. This is representative of a scale parameter node coupled to a node of a neural network. Note that the equation in ¶ 0103 applies to any number of neurons in a 
While providing a mathematical basis for interconnection of bias and scale parameters/nodes, Esterline does not expressly depict representation of such hyperparameters as nodes in a graph. However, Smyth clearly depicts a bias node at each layer in a neural network. See Smyth, Fig. 7, depicting multiple layers of a neural network, each having a bias node. Also, Eberhart provides a similar description in terms of scale vectors. See Fig. 1 along with col. 10, lines 36-63, generally describing a 3 layer neural network having a first scale node A and a second scale node B. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Esterline’s multiple hidden layers of bias and scale parameters in terms of nodes in a neural network as depicted by Smyth and described by Eberhart, in order to provide a data abstraction permitting analysis and enhanced understanding of a neural network as known by those of ordinary skill in the art. Such customization provides the ability to tune a network in order to utilize appropriate values as suggested by at least Eberhart (see Eberhart col. 8, lines 9-21, and col. 10, lines 30-35).
wherein both the first bias parameter node and the first scale parameter node in the first nodes are coupled to each one of the second nodes excluding the second bias parameter node and the second scale parameter node in the second nodes, See Esterline, Fig. 7, depicting a bias parameter node coupled to first and second nodes. Also see p. 8, equation 3 in ¶ 0103, which shows an activation function including a scale 
Esterline does not expressly disclose: the second scale parameter node being coupled to each one of the third nodes in the subsequent hidden layer excluding the third bias parameter node and the third scale parameter node, However, as cited above, Eberhart teaches the use of scale parameters with each processing element in a neural network (e.g. see col. 10, lines 36-63). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Esterline’s multiple hidden layers with Eberhart’s slope vectors in order to scale inputs of a hidden processing element as suggested by Eberhart (see col. 6, lines 39-48).
the first bias parameter node and the first scale parameter node being respectively different from and in a separate node from the second bias parameter node and the second scale parameter node, See Esterline, Fig. 7, depicting processing nodes each having respectively different applied biases. However, Smyth and Eberhart also teach nodes with respectively different and separate embodiments. See Smyth, Fig. 7, depicting separate bias nodes. Also see Eberhart, col. 10, lines 36-63, describing scale nodes A and B.
the second scale parameter node in the at least one hidden layer being different from and in a separate node from the third scale parameter node in the subsequent hidden layer, However, this is taught in combination with Eberhart. See Eberhart, col. 10, lines 36-63, describing numerous scale parameters (i.e. “nodes”), each distinct from the other (i.e. α, α2, etc.). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Esterline’s multiple hidden layers with Eberhart’s multiple distinct scale parameters in order to scale inputs of a hidden processing element as suggested by Eberhart (see col. 6, lines 39-48).
Esterline does not expressly disclose: wherein the second scale parameter node and the third scale parameter node are updated by backpropagation respectively in the at least one hidden layer and the subsequent hidden layer. Note that backpropagation is well known and utilized by Esterline. See Esterline, ¶ 0092, e.g. “backpropagation.” Keeler also teaches the use of backpropagation to find activation parameter values. See Keeler, col. 2, lines 47-52, e.g. “Having done this, the internal weights and activation parameters of the hidden layer(s) are modified by a learning algorithm to provide an output pattern which more closely approximates the desired output pattern, while minimizing the error over the spectrum of input patterns.” Col. 8, lines 69 – col. 9, line 2, e.g. “Through Back Propagation, the weights wij and the parameters associated with the activation function, μhi and σhi, are varied to minimize the error function.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Esterline’s scale parameter with Keeler’s utilization of 

	In regard to claim 2, Esterline discloses:
2. The computer-implemented method of claim 1, wherein it is determined that the current layer is the final layer, and wherein the activation result is output as the classifier output of the NN. See Esterline, ¶ 0078, e.g. “In the system of FIG. 1, the far-right layer is an output layer. A single sweep through the network from left to right results in the assignment of a value to each output node.” Also see ¶ 0092, e.g. “the output classifies the input signal to the desired output”

	In regard to claim 3, Esterline discloses:
3. The computer-implemented method of claim 2, further comprising: determining a difference between an actual target output and the classifier output; and updating the respective set of weights to be applied during a next iteration of the training based at least in part on the difference between the actual target output and the classifier output. See Esterline, ¶ 0093, e.g. “an error signal representing the difference between the reference signal and the output signal is analyzed … a computer processor solves for the required weights and bias, or biases, to achieve a curve that matches the crystal frequency.”


8. A system for training a neural network (NN) comprising a plurality of layers including hidden layers having at least one hidden layer and a subsequent hidden layer, wherein each of the plurality of layers includes a respective set of noes, the system comprising: at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions to:  See Esterline, Fig. 17, depicting a system including memory and a processor.  Also see ¶ 0077, e.g. “In some embodiments, there are multiple hidden layers.”
All further limitations have been addressed in the above rejection of claim 1. 

	In regard to claims 9-10, parent claim 8 is addressed above. All further limitations have been addressed in the above rejections of claims 2-3, respectively.

	In regard to claim 15, Esterline discloses:
15. A computer program product for training a neural network (NN) comprising a plurality of layers including hidden layers having at least one hidden layer and a subsequent hidden layer, wherein each of the plurality of layers includes a respective set of nodes, the computer program product comprising a non-transitory storage medium readable by a processing circuit, the storage medium storing instructions executable by the processing circuit to cause a method to be performed, the method comprising:  See Esterline, Fig. 17 along with ¶ 0175, e.g. “The steps of a method, process, or algorithm 
All further limitations have been addressed in the above rejection of claim 1.

In regard to claims 16-17, parent claim 15 is addressed above. All further limitations have been addressed in the above rejections of claims 2-3, respectively.

Claims 4-5, 11-12, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Esterline, Smyth, Eberhart, and Keeler as applied above, and further in view of United States Patent Application Publication 2005/0015251 by Pi et al. (“Pi”).

	In regard to claim 4, Esterline discloses:
4. The computer-implemented method of claim 3, further comprising: determining a cumulative error associated with all nodes in the NN; See Esterline, ¶ 0093, e.g. “the squared error at each measured temperature point could be summed together, with the minimum squared error over the temperature range being solved for, as well as the corresponding weights that produce such error.”
Esterline does not expressly disclose: determining that the cumulative error exceeds a threshold value; and determining that the next iteration of the training should be performed in response to determining that the cumulative error exceeds the threshold value. However, Pi teaches iterative training and error comparison with a threshold. See 

	In regard to claim 5, Esterline and Pi also teach:
5. The computer-implemented method of claim 4, wherein the updating is performed in response to determining that the next iteration of the training should be performed. See Pi, ¶ 0023 and 0034, e.g. “back-propagates the error signal.”

In regard to claims 11-12, parent claim 10 is addressed above. All further limitations have been addressed in the above rejections of claims 4-5, respectively.

In regard to claims 18-19, parent claim 17 is addressed above. All further limitations have been addressed in the above rejections of claims 4-5, respectively.

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Esterline, Smyth, Eberhart, and Keeler as applied above, and further in view of United States Patent Application Publication 2016/0171974 by Hannun et al. (“Hannun”).

In regard to claim 6, the rejection of parent claim 2 is provided above. Esterline does not expressly disclose claim 6. However, Hannun teaches: 
6. The computer-implemented method of claim 2, further comprising: determining the set of inputs from an acoustic signal; and decoding a set of classifier outputs including the classifier output to determine a character string corresponding to the acoustic signal. See Hannun, Fig. 5 and ¶ 0057, e.g. “find the sequence of characters.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Esterline’s output with Hannun’s character output in order to provide a simple, high performance speech recognition system as suggested by Hannun (see ¶ 0030).

In regard to claim 13, parent claim 9 is addressed above. All further limitations have been addressed in the above rejection of claim 6.

In regard to claim 20, parent claim 16 is addressed above. All further limitations have been addressed in the above rejection of claim 6.

Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Esterline, Smyth, Eberhart, and Keeler as applied above, and further in view of United States Patent 8,418,249 by Nucci et al. (“Nucci”).

In regard to claim 7, parent claim 1 is addressed above. Esterline does not expressly disclose the limitations of claim 7. However, Nucci teaches the following:
7. The computer-implemented method of claim 1, further comprising: determining that a threshold number of iterations of the training have been performed, wherein a respective final set of weights to be applied when executing the activation function for each node in the NN is obtained after performing a final iteration of the training. See Nucci, col. 18, line 66 – col. 19, line 1, e.g. “When the maximum number of iterations T is reached, a final kernel is computed.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Esterline’s iterations with Nucci’s threshold in order to decide when a sufficient computation is accomplished as is essentially suggested by Nucci and known to those of ordinary skill in the art (see Nucci, at least col. 15, lines 57-65).

In regard to claim 14, parent claim 8 is addressed above. All further limitations have been addressed in the above rejection of claim 7.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
U.S. Patent Application Publication 2009/0157578 by Sellamanickam et al. teaches use of an optimizable scaling parameter. See ¶ 0032. It is noted that weights are often used to adjust or “optimize” a given parameter.

U.S. Patent Application Publication 2016/0328643 by Liu et al. See Liu, Fig. 2, depicting bias nodes labeled as “+1”. Also see ¶ 0024, e.g. “The bias term and the weights between the input layer 202 and the hidden layer 204 are learned in the training of the AE 200, for example using a back-propagation algorithm.”
"The influence of the sigmoid function parameters on the speed of backpropagation learning" by Han et al. teaches the use of different slope (i.e. scale) parameters for different hidden layers. See section 3, p. 198, e.g. “From (3.6) it is straightforward shown that by adopting different parameters in different hidden layers (i.e. tuning the slopes or dynamic ranges of the functions) we can enlarge the deltas for every hidden layer respectively.” 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to James D Rutten whose telephone number is (571)272-3703.  The examiner can normally be reached on M-F 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.





/James D. Rutten/Primary Examiner, Art Unit 2121