DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 8, and 15 have been amended. Claims 1-20 have been examined.

Response to Arguments
Applicant's remaining arguments filed 1/28/2021 have been fully considered but they are not persuasive. 
On pp. 10-11 of the remarks, Applicant essentially argues that cited art of record Fogel fails to teach newly amended limitations regarding “wherein each of the plurality of layers excluding a final layer comprises at least one first node as the scale parameter connected by interconnection weights to each node in a next layer of the NN, wherein each of the plurality of layers excluding the final layer comprises at least one second node as the bias parameter connected by interconnection weights to each node in the next layer of the NN.” Applicant has noted that cited art of record Esterline has been cited in support of “weights, bias term, and scale parameter,” but the combination fails to disclose the above features. However, further review of the cited art reveals that Esterline discloses an activation function which utilizes a scale factor that can be used for a given node in a given layer (see at least equation 3 in ¶ 0103). Fogel also teaches that a node in a layer except the final output layer are connected to each node in a subsequent layer (see Fogel, col. 7, lines 62-65 and col. 8, lines 7-11). Fogel teaches that such interconnection allows for the training of a network and minimization of 
At the bottom of p. 11 – p. 12, Applicant essentially argues that the cited art fails to teach “the at least one second node as the bias parameter in one layer being different from the at least one second node as the bias parameter in the next layer” However, Esterline appears to disclose this limitation. See Esterline, ¶ 0108 along with Figs. 7 and 35. Therefore, this argument is not persuasive.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 8-10, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over United States Patent Application Publication 2014/0337261 by Esterline (“Esterline”) in view of U.S. Patent 5,214,746 to Fogel et al. (“Fogel”) and U.S. Patent Application Publication 2015/0371132 by Gemello et al. (“Gemello”).

	In regard to claim 1, Esterline discloses:
1. A computer-implemented method for training a neural network (NN) comprising a plurality of layers including at least one hidden layer, wherein each of the plurality of layers includes a respective set of nodes, the method comprising: See Esterline, Fig. 1, depicting a NN with a plurality of layers. Also see ¶ 0092, e.g. “the network may be trained using supervised learning or unsupervised learning.”
determining, for a current layer of the NN for a current iteration of the training, 
i) a set of inputs, the set of inputs including a set of training inputs or a set of activation results associated with the respective set of nodes of a prior layer of the NN, See Esterline, Figs. 2 and 3A, depicting inputs. Also, see ¶ 0079, e.g. “inputs.”
ii) a bias parameter, See Esterline, Fig. 3A, depicting bias “b” along with ¶ 0022, e.g. “bias input.”
iii) a scale parameter, and See Esterline, ¶ 0104, e.g. “the values of one or more of the variables α, … are iteratively calculated during the training phase of the ANN.” 
iv) a respective set of weights to be applied to the set of inputs, See Esterline, Fig. 3A, depicting input weights, along with ¶ 0010. … the bias parameter, and See Esterline, Fig. 7, depicting weights applied to a bias parameter. Also see ¶ 0091. … the scale parameter See Esterline, equation 3 along with ¶ 0104, e.g. “the values of … ω1, … are iteratively calculated during the training phase of the ANN.” … for each node of the current layer of the NN;  See at least ¶ 0096, e.g. “each neuron can be solved”
determining, for a particular node of the current layer during the current iteration of the training, a combined input based at least in part on the set of inputs, the respective set of weights associated with the particular node, and the bias parameter; See Esterline, Fig. 3A, depicting combined input Z.
determining a weighted scale parameter by applying a corresponding weight from the respective set of weights to the scale parameter, wherein the corresponding weight is learned from one or more prior iterations of the training, See Esterline, p. 8, equation 3, which shows a scale factor α, being weighted by ω1. Also see ¶ 0104, e.g. “the values of one or more of the variables α, ω1, and b are iteratively calculated during the training phase of the ANN.”
wherein … [a layer] … comprises at least one first node as the scale parameter connected by interconnection weights …, See Esterline, p. 8, equation 3 in ¶ 0103, which shows an activation function including a scale factor α, being weighted by ω1. The scale parameter of equation 3 is interpreted as a node in a model, similar to the depiction of weight nodes in Fig. 2. 
It is noted that Esterline teaches providing multiple scale factors for multiple nodes (e.g. see Table B on p. 7). Esterline also generally teaches the use of a scale factor as an activation function for an arbitrary node in an arbitrary layer (see equation 3 in ¶ 0103). But Esterline does not expressly disclose wherein each of the plurality of layers excluding a final layer comprises at least one first node as the scale parameter connected by weights to each node in a next layer of the NN, However, this is taught by Fogel. See Fogel, Fig. 1 along with col. 7, lines 62-65, e.g. “Every node of the input layer and the hidden layers is connected to every node of the next successive layer through a respective weighted connection or connection weight or simply weight wt.” Also see col. 
wherein each of the plurality of layers excluding the final layer comprises at least one second node as the bias parameter connected by interconnection weights to each node in the next layer of the NN, See Esterline, Fig. 3A, depicting bias “b” connected to a node by interconnection weight “ωb” along with ¶ 0022, e.g. “bias input.” Note that in Fig. 2, weights are depicted using a “node” representation. The bias and weight terms of Fig. 3A are similarly conceptually interpreted as nodes in a model. Also see Fig. 35, depicting a first layer with a first bias b, and a second layer with a second bias bout. It is noted that Esterline teaches providing multiple bias parameters for multiple layers (e.g. see Fig. 35), but does not expressly disclose wherein each of the plurality of layers excluding a final layer comprises at least one second node as the bias parameter connected by weights to each node in the next layer of the NN, However, this is taught by Fogel. See Fogel, Fig. 1 along with col. 7, lines 62-65 and col. 8, lines 7-11 as cited above. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Esterline’s multiple bias parameters with 
the at least one second node as the bias parameter in one layer being different from the at least one second node as the bias parameter in the next layer; See Esterline, ¶ 0108, e.g. “The output neuron may be replaced with a linear summation module, which may have its own input weights, output weights, and/or dedicated bias.” The description of a “dedicated bias” suggests that the first layer of Fig. 7 is provided with a bias, while the second layer is provided with a bias that is dedicated to that layer. Also see Fig. 35, depicting two distinct bias parameters provided to two different layers.
executing, for the current iteration of the training, an activation function for the particular node, using the combined input and the weighted scale parameter to generate an activation result for the particular node; See Esterline, Fig. 3A, depicting an activation function that utilizes a summation of weighted inputs and bias b. Also see ¶ 0080, e.g. “As is shown in FIG. 3A, input vectors are scaled with individual synaptic weights ω1 through ωn, and fed into a linear summation module. The inputs are linearly summed together along with a bias term, `b`. In FIG. 3A, the resulting linear sum is denoted as `z`. The linear sum is then fed through an activation function to produce an output `y.`” Also see Table A on p. 6, describing a uni-polar sigmoid, e.g.:
                
                    y
                    =
                     
                    
                        
                            1
                        
                        
                            1
                            +
                             
                            
                                
                                    e
                                
                                
                                    -
                                    α
                                    (
                                    z
                                    )
                                
                            
                        
                    
                
            
Also see ¶ 0103-0107e.g. “the values of one or more of the variables α, ω1, and b are iteratively calculated.” … “By changing the sign of α, the sign of the slope of the activation function output may also be changed (e.g., from positive slope to negative 
                
                    Ø
                    =
                     
                    
                        
                            1
                        
                        
                            1
                            +
                             
                            
                                
                                    e
                                
                                
                                    -
                                    α
                                    (
                                    ω
                                    x
                                    +
                                    b
                                    )
                                
                            
                        
                    
                
            
determining whether the current layer is the final layer; and outputting, based at least in part on determining whether the current layer is the final layer, the activation result as a classifier output of the NN or providing the activation result as input to the next layer of the NN. See Esterline, ¶ 0078, e.g. “In the system of FIG. 1, the far-right layer is an output layer. A single sweep through the network from left to right results in the assignment of a value to each output node.” Also see ¶ 0092, e.g. “the output classifies the input signal to the desired output.”

	In regard to claim 2, Esterline discloses:
2. The computer-implemented method of claim 1, wherein it is determined that the current layer is the final layer, and wherein the activation result is output as the classifier output of the NN. See Esterline, ¶ 0078, e.g. “In the system of FIG. 1, the far-right layer is an output layer. A single sweep through the network from left to right results in the assignment of a value to each output node.” Also see ¶ 0092, e.g. “the output classifies the input signal to the desired output”


3. The computer-implemented method of claim 2, further comprising: determining a difference between an actual target output and the classifier output; and updating the respective set of weights to be applied during a next iteration of the training based at least in part on the difference between the actual target output and the classifier output. See Esterline, ¶ 0093, e.g. “an error signal representing the difference between the reference signal and the output signal is analyzed … a computer processor solves for the required weights and bias, or biases, to achieve a curve that matches the crystal frequency.”

	In regard to claim 8, Esterline discloses:
8. A system for training a neural network (NN) comprising a plurality of layers including at least one hidden layer, wherein each of the plurality of layers includes a respective set of noes, the system comprising: at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions to:  See Esterline, Fig. 17, depicting a system including memory and a processor. 
All further limitations have been addressed in the above rejection of claim 1. 

	In regard to claims 9-10, parent claim 8 is addressed above. All further limitations have been addressed in the above rejections of claims 2-3, respectively.


15. A computer program product for training a neural network (NN) comprising a plurality of layers including at least one hidden layer, wherein each of the plurality of layers includes a respective set of nodes, the computer program product comprising a non-transitory storage medium readable by a processing circuit, the storage medium storing instructions executable by the processing circuit to cause a method to be performed, the method comprising:  See Esterline, Fig. 17 along with ¶ 0175, e.g. “The steps of a method, process, or algorithm described in connection with the embodiments disclosed herein can be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two.”
All further limitations have been addressed in the above rejection of claim 1.

In regard to claims 16-17, parent claim 15 is addressed above. All further limitations have been addressed in the above rejections of claims 2-3, respectively.

Claims 4-5, 11-12, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Esterline, Fogel, and Gemello as applied above, and further in view of United States Patent Application Publication 2005/0015251 by Pi et al. (“Pi”).

	In regard to claim 4, Esterline discloses:
4. The computer-implemented method of claim 3, further comprising: determining a cumulative error associated with all nodes in the NN; See Esterline, ¶ 0093, e.g. “the squared error at each measured temperature point could be summed 
Esterline does not expressly disclose: determining that the cumulative error exceeds a threshold value; and determining that the next iteration of the training should be performed in response to determining that the cumulative error exceeds the threshold value. However, Pi teaches iterative training and error comparison with a threshold. See Pi, Fig. 6 and ¶ 0034, e.g. “A threshold decision is made at block 7084 as to whether to keep current the value of weights.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Esterline’s error with Pi’s iterative threshold comparison in order to provide efficient convergence as essentially suggested by Pi (see ¶ 0005).

	In regard to claim 5, Esterline and Pi also teach:
5. The computer-implemented method of claim 4, wherein the updating is performed in response to determining that the next iteration of the training should be performed. See Pi, ¶ 0023 and 0034, e.g. “back-propagates the error signal.”

In regard to claims 11-12, parent claim 10 is addressed above. All further limitations have been addressed in the above rejections of claims 4-5, respectively.

.

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Esterline, Fogel, and Gemello as applied above, and further in view of United States Patent Application Publication 2016/0171974 by Hannun et al. (“Hannun”).

	In regard to claim 6, the rejection of parent claim 2 is provided above. Esterline does not expressly disclose claim 6. However, Hannun teaches: 
6. The computer-implemented method of claim 2, further comprising: determining the set of inputs from an acoustic signal; and decoding a set of classifier outputs including the classifier output to determine a character string corresponding to the acoustic signal. See Hannun, Fig. 5 and ¶ 0057, e.g. “find the sequence of characters.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Esterline’s output with Hannun’s character output in order to provide a simple, high performance speech recognition system as suggested by Hannun (see ¶ 0030).

In regard to claim 13, parent claim 9 is addressed above. All further limitations have been addressed in the above rejection of claim 6.

.

Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Esterline, Fogel, and Gemello as applied above, and further in view of United States Patent 8,418,249 by Nucci et al. (“Nucci”).

	In regard to claim 7, parent claim 1 is addressed above. Esterline does not expressly disclose the limitations of claim 7. However, Nucci teaches the following:
7. The computer-implemented method of claim 1, further comprising: determining that a threshold number of iterations of the training have been performed, wherein a respective final set of weights to be applied when executing the activation function for each node in the NN is obtained after performing a final iteration of the training. See Nucci, col. 18, line 66 – col. 19, line 1, e.g. “When the maximum number of iterations T is reached, a final kernel is computed.” It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Esterline’s iterations with Nucci’s threshold in order to decide when a sufficient computation is accomplished as is essentially suggested by Nucci and known to those of ordinary skill in the art (see Nucci, at least col. 15, lines 57-65).

.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. U.S. Patent Application Publication 2009/0157578 by Sellamanickam et al. teaches use of an optimizable scaling parameter. See ¶ 0032. It is noted that weights are often used to adjust or “optimize” a given parameter.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to James D Rutten whose telephone number is (571)272-3703.  The examiner can normally be reached on M-F 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/James D. Rutten/Primary Examiner, Art Unit 2121