DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The filing date of the present application is 03/06/2017.
This action is in response to amendments and/or remarks filed on 11/19/2020. In the current amendments, claims 1 and 11 have been amended. Claims 1-20 are pending and have been examined. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/19/2020 has been entered. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7 and 11-17 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong et al. (An Information Theoretic Approach to Adaptive System Training Using Unlabeled Data) in view of Kimble (US 2004/0034505 A1).

Regarding claim 1, 
Jeong teaches
A method comprising: 
training a neural network, during a training phase of the neural network, to produce a trained neural network ([figs 1-2]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error.”; [sec V] “In the simulation, we used the same neural network topology for training and testing.”; see also [sec III]); 

after completing the training phase of the neural network: performing inference, using the trained neural network comprising a first set of weights, on a particular unlabeled input to produce a first output ([figs 1-2]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec V] “In the simulation, we used the same neural network topology for training and testing.”; see also [sec III]);

determining whether the first output violates one or more constraints on output values of the trained neural network ([figs 1-7]; [table 1]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; see also [sec III]; “the likelihood functions P(u | y = Ci) change from training to the testing set” and “the likelihood functions do not change from the training to the testing set” read on “determining whether the first output violates one or more constraints on output values of the trained neural network”.);

in response to determining that the first output violates the one or more constraints on output values of the trained neural network: 
customizing the weights of the trained neural network for the first output by using [algorithm], based on an optimization problem, to adjust the first set of weights of the trained neural network to produce an adjusted trained neural network comprising a second set of weights ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] “Based on this assumption, the Euclidean distance pdf matching algorithm is proposed to adapt the system weights in application phase as 
    PNG
    media_image1.png
    76
    411
    media_image1.png
    Greyscale
 (3) where fdtrn(x) is the pdf of desired signal during training phase, fytst(x) is the pdf of system output signal during testing phase and w is the weight vector of the adaptive system. … In order to adjust the weights in the testing phase, we take the derivative of J with respect to w to obtain the gradient descent update”; Eq (3) reads on “optimization problem”.),

wherein the optimization problem (a) is based on the one or more constraints on output values of the trained neural network, and (b) is over weights of the neural network ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] as cited above; “the likelihood functions do not change from the training to the testing set” read on “constraints on output values of the trained neural network”. In addition, eq (3) reads on “optimization problem” and the equation is based on the constraints of probabilities and is for adjusting weights of neural network.);

wherein the first set of weights is different than the second set of weights ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] as cited above; “adjust the weights in the testing phase” reads on “the first set of weights is different than the second set of weights”.);

([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec V] “In the first simulation, we artificially generated two classes with conditional pdf for class 1, P(u | C1), being a Gaussian distributed with zero mean and unit variance; the conditional pdf for class 2, P(u | C2), has mean vector [2, 2] and unit variance respectively. The two classes have equal a priori probabilities.”; the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “second output” since weights are adjusted over iterations.),

wherein the second output is different from the first output ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec V] as cited above; The classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “the second output is different from the first output” since weights are adjusted over iterations.); and 

However, Jeong does not teach
customizing the weights of the trained neural network for the first output by using backpropagation, based on an optimization problem, to adjust the first set of weights of the trained neural network to produce an adjusted trained neural network comprising a second set of weights
wherein the method is performed by one or more computing devices.

Kimble teaches
customizing the weights of the trained neural network for the first output by using backpropagation, based on an optimization problem, to adjust the first set of weights of the trained neural network to produce an adjusted trained neural network comprising a second set of weights ([figs 7]; [pars 53-56] “An artificial neural network is conceptually a directed graph of artificial neurons (also called nodes) and connections, each connection running between two nodes in a single direction, from a source node to a target node. An adaptive weight is associated with each connection. The adaptive weight is a coefficient which is applied to the output of the Source node to produce a portion of the input to the target node. … Preferably, data glove support application 312 contains a neural network function, which uses stored parameters to construct and execute an artificial neural network. Specifically, the number of network nodes at each level, the connection weights for each connection, and the node output function, can all be stored as parameters of a particular neural network. These parameters are stored in profile 323.”; [pars 98-102] “The sensor data from keys which have not been eliminated as errors is then used as additional training data to update the neural network. I.e., the network is again trained with the backpropagation algorithm using the additional data (step 723). … Periodic re-training of the neural network by backpropagation with the additional training data helps to further refine the accuracy of the network”; “backpropagation algorithm” reads on “backpropagation, based on an optimization problem”. For more details, see Rumelhart et al. (Learning representations by back-propagating errors). Note that Jeong teaches “customizing the weights of the trained neural network for the first output”.);

wherein the method is performed by one or more computing devices (fig 2-3).

Jeong and Kimble are all in the same field of endeavor of processing input signal with the neural networking system and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural networking system of Jeong with the backpropagation of Kimble. Doing so would lead to enabling re-training of the neural network by backpropagation to refine the accuracy of the network (Kimble, pars 53-56).

Regarding claim 2, 
Jeong and Kimble teach claim 1.
Jeong further teaches 
using [algorithm] to adjust the first set of weights of the trained neural network to produce the adjusted trained neural network is performed during a testing phase of the neural network ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited for claim 1; [sec III] “In order to adjust the weights in the testing phase, we take the derivative of J with respect to w to obtain the gradient descent update”),

Kimble further teaches
using backpropagation to adjust the first set of weights of the trained neural network to produce the adjusted trained neural network ([figs 7]; [pars 53-56] “An artificial neural network is conceptually a directed graph of artificial neurons (also called nodes) and connections, each connection running between two nodes in a single direction, from a source node to a target node. An adaptive weight is associated with each connection. The adaptive weight is a coefficient which is applied to the output of the Source node to produce a portion of the input to the target node. … Preferably, data glove support application 312 contains a neural network function, which uses stored parameters to construct and execute an artificial neural network. Specifically, the number of network nodes at each level, the connection weights for each connection, and the node output function, can all be stored as parameters of a particular neural network. These parameters are stored in profile 323.”; [pars 98-102] “The sensor data from keys which have not been eliminated as errors is then used as additional training data to update the neural network. I.e., the network is again trained with the backpropagation algorithm using the additional data (step 723). … Periodic re-training of the neural network by backpropagation with the additional training data helps to further refine the accuracy of the network”).

Jeong and Kimble are all in the same field of endeavor of processing input signal with the neural networking system and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural networking system of Jeong and Kimble with the backpropagation of Kimble. Doing so would lead to enabling re-training of the neural network by backpropagation to refine the accuracy of the network (Kimble, pars 53-56).

Regarding claim 3, 
Jeong and Kimble teach claim 1.
Jeong further teaches 
after performing inference, using the adjusted trained neural network, on the particular unlabeled input to produce the second output: determining whether the second output violates the one or more constraints on output values of the trained neural network ([figs 1-7]; [table 1]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; see also [sec III]; “the likelihood functions P(u | y = Ci) change from training to the testing set” and “the likelihood functions do not change from the training to the testing set” read on “determining whether the second output violates the one or more constraints on output values of the trained neural network”. In addition, the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “second output” since weights are adjusted over iterations.);

in response to determining that the second output violates the one or more constraints on output values of the trained neural network, using [algorithm] to adjust the 31second set of weights of the trained neural network to produce a second adjusted trained neural network comprising a third set of weights ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] “In order to adjust the weights in the testing phase, we take the derivative of J with respect to w to obtain the gradient descent update”; “the likelihood functions P(u | y = Ci) change from training to the testing set” reads on “determining that the second output violates the one or more constraints on output values of the trained neural network”. In addition, the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “second output” since weights are adjusted over iterations.),

wherein the third set of weights is different than both the first set of weights and second set of weights ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] as cited above; “adjust the weights in the testing phase”, the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “the third set of weights is different than both the first set of weights and second set of weights”.);

Kimble further teaches
using backpropagation to adjust the [31[set] of weights of the trained neural network to produce a [adjusted] trained neural network comprising a third set of weights ([figs 7]; [pars 53-56] “An artificial neural network is conceptually a directed graph of artificial neurons (also called nodes) and connections, each connection running between two nodes in a single direction, from a source node to a target node. An adaptive weight is associated with each connection. The adaptive weight is a coefficient which is applied to the output of the Source node to produce a portion of the input to the target node. … Preferably, data glove support application 312 contains a neural network function, which uses stored parameters to construct and execute an artificial neural network. Specifically, the number of network nodes at each level, the connection weights for each connection, and the node output function, can all be stored as parameters of a particular neural network. These parameters are stored in profile 323.”; [pars 98-102] “The sensor data from keys which have not been eliminated as errors is then used as additional training data to update the neural network. I.e., the network is again trained with the backpropagation algorithm using the additional data (step 723). … Periodic re-training of the neural network by backpropagation with the additional training data helps to further refine the accuracy of the network”).

Jeong and Kimble are all in the same field of endeavor of processing input signal with the neural networking system and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural networking system of Jeong and Kimble with the backpropagation of Kimble. Doing so would lead to enabling re-training of the neural network by backpropagation to refine the accuracy of the network (Kimble, pars 53-56).

Regarding claim 4, 
Jeong and Kimble teach claim 1.
Jeong further teaches 
after performing inference, using the adjusted trained neural network, on the particular unlabeled input to produce the second output: determining whether the second output violates the one or more constraints (see the rejections of claim 3);

in response to determining that the second output does not violate the one or more constraints on output values of the trained neural network, storing the second output as an inference result of the particular unlabeled input ([figs 1-7]; [table 1]; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; see also [sec III]; [sec V] “first simulation” and “second simulation”; “the likelihood functions P(u | y = Ci) change from training to the testing set” and “the likelihood functions do not change from the training to the testing set” read on “determining that the second output does not violate the one or more constraints on output values of the trained neural network”. In addition, the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “second output” since weights are adjusted over iterations.);

Regarding claim 5, 
Jeong and Kimble teach claim 1.

Jeong further teaches
the trained neural network, produced by said training the neural network, comprises an original set of weights ([figs 1-2]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error.”; [sec V] “In the simulation, we used the same neural network topology for training and testing.”; see also [sec III]; “In the training phase, the adaptive system weights are adjusted” reads on “original set of weights” since a set of weights is given after the training phase.);

the method further comprises:
performing inference, using the trained neural network comprising the original set of weights, on a second unlabeled input to produce a third output ([figs 1-2]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec V] “In the simulation, we used the same neural network topology for training and testing.”; see also [sec III]; “novel unlabeled input data {uN+1, …, uK}” reads on “second unlabeled input”. In addition, eq (2) with “{uN+1, …, uK}” reads on “third output”.);

determining whether the third output violates the one or more constraints on output values of the trained neural network ([figs 1-7]; [table 1]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; see also [sec III]; “the likelihood functions P(u | y = Ci) change from training to the testing set” and “the likelihood functions do not change from the training to the testing set” read on “determining whether the third output violates the one or more constraints on output values of the trained neural network”. In addition, eq (2) with “{uN+1, …, uK}” reads on “third output”.);

in response to determining that the third output violates the one or more constraints on output values of the trained neural network: 
([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] “In order to adjust the weights in the testing phase, we take the derivative of J with respect to w to obtain the gradient descent update”; “the likelihood functions P(u | y = Ci) change from training to the testing set” reads on “determining that the third output violates the one or more constraints on output values of the trained neural network”. In addition, eq (2) with “{uN+1, …, uK}” reads on “third output”.);

wherein the third set of weights is different than both the original set of weights and the second set of weights ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] as cited above; “adjust the weights in the testing phase”, the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “the third set of weights is different than both the original set of weights and the second set of weights”.);

performing inference, using the second adjusted trained neural network, on the second unlabeled input to produce a fourth output ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec V] “In the first simulation, we artificially generated two classes with conditional pdf for class 1, P(u | C1), being a Gaussian distributed with zero mean and unit variance; the conditional pdf for class 2, P(u | C2), has mean vector [2, 2] and unit variance respectively. The two classes have equal a priori probabilities.”; the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “fourth output” since weights are adjusted over iterations.),

wherein the fourth output is different from the third output ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec V] as cited above; The classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “the fourth output is different from the third output” since weights are adjusted over iterations.);

Kimble further teaches
using backpropagation, based on the optimization problem, to adjust the original set of weights of the trained neural network to produce a [adjusted] trained neural network comprising a [set] of weights ([figs 7]; [pars 53-56] “An artificial neural network is conceptually a directed graph of artificial neurons (also called nodes) and connections, each connection running between two nodes in a single direction, from a source node to a target node. An adaptive weight is associated with each connection. The adaptive weight is a coefficient which is applied to the output of the Source node to produce a portion of the input to the target node. … Preferably, data glove support application 312 contains a neural network function, which uses stored parameters to construct and execute an artificial neural network. Specifically, the number of network nodes at each level, the connection weights for each connection, and the node output function, can all be stored as parameters of a particular neural network. These parameters are stored in profile 323.”; [pars 98-102] “The sensor data from keys which have not been eliminated as errors is then used as additional training data to update the neural network. I.e., the network is again trained with the backpropagation algorithm using the additional data (step 723). … Periodic re-training of the neural network by backpropagation with the additional training data helps to further refine the accuracy of the network”; “backpropagation algorithm” reads on “backpropagation, based on an optimization problem”. For more details, see Rumelhart et al. (Learning representations by back-propagating errors). Note that Jeong teaches “using [algorithm], based on the optimization problem, to adjust the original set of weights of the trained neural network to produce a second adjusted trained neural network comprising a third set of weights”.).

Kimble, pars 53-56).

Regarding claim 6, 
Jeong and Kimble teach claim 1.

Jeong further teaches 
determining whether the first output violates one or more constraints on output values of the trained neural network is based on a loss function that encodes the one or more constraints ([figs 1-7]; [table 1]; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; [sec III] “Based on this assumption, the Euclidean distance pdf matching algorithm is proposed to adapt the system weights in application phase as 
    PNG
    media_image1.png
    76
    411
    media_image1.png
    Greyscale
 (3) where fdtrn(x) is the pdf of desired signal during training phase, fytst(x) is the pdf of system output signal during testing phase and w is the weight vector of the adaptive system.”; “the likelihood functions P(u | y = Ci) change from training to the testing set” and “the likelihood functions do not change from the training to the testing set” read on “determining whether the first output violates one or more constraints on output values of the trained neural network”. In addition, eq (3) reads on “loss function that encodes the one or more constraints”.);

Regarding claim 7, 
Jeong and Kimble teach claim 6.

Jeong further teaches 
receiving a definition of the one or more constraints ([figs 1-7]; [table 1]; [sec III] “Our idea is to combine the unlabeled input data in the testing set and the information of the desired output for training in order to continue adjusting the system weights. We assume here that the a priori probability of each class during testing is the same as for training. We will discuss more details about this hypothesis in the next section. Based on this assumption, the Euclidean distance pdf matching algorithm is proposed to adapt the system weights in application phase as 
    PNG
    media_image1.png
    76
    411
    media_image1.png
    Greyscale
 (3) where fdtrn(x) is the pdf of desired signal during training phase, fytst(x) is the pdf of system output signal during testing phase and w is the weight vector of the adaptive system.”; see also [sec IV]); and 

automatically formulating the loss function based on the definition of the one or more constraint ([figs 1-7]; [table 1]; [sec III] as cited above, and “We propose to compute the Euclidean distance pdf matching cost function directly from data samples, i.e. nonparametrically. This requires a smooth (i.e., continuous and differentiable) estimator for the two probability density functions fdtrn(x) and fytst(x). … A batch method is used here to compute the weights update. An online approach is also possible with the introduction of stochastic information gradient [13].”; see also [sec IV]);

Regarding claim 11, 
Claim 11 is a computer-readable media claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 1. Note that Kimble teaches computer-readable media and processors ([figs 2-3]).

Regarding claim 12, 
Claim 12 is a computer-readable media claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 2.

Regarding claim 13, 
Claim 13 is a computer-readable media claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 3.

Regarding claim 14, 
Claim 14 is a computer-readable media claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 4.

Regarding claim 15, 
Claim 15 is a computer-readable media claim corresponding to the method claim 5, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 5.

Regarding claim 16, 
Claim 16 is a computer-readable media claim corresponding to the method claim 6, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 6.

Regarding claim 17, 
Claim 17 is a computer-readable media claim corresponding to the method claim 7, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 7.

Claims 8, 10, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong et al. (An Information Theoretic Approach to Adaptive System Training Using Unlabeled Data) in view of Kimble (US 2004/0034505 A1), further in view of Bolt et al. (US 2005/0149463 A1).

Regarding claim 8, 
Jeong and Kimble teach claim 7.

However, Jeong and Kimble do not teach
the one or more constraints is defined using one or more of: 

regular language.

Bolt teaches 
the one or more constraints is defined using one or more of: 
context-free language; or 
regular language (Table 1 and pars 63-67 discuss the constraints being provided as regular language by way of strings.).

Jeong, Kimble and Bolt are all in the same field of endeavor of processing input signal with the neural networking system and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural networking system of Jeong and Kimble with the regular language constraints of Bolt. Doing so would lead to enabling the constraints based on the regular language to be used for adjusting the neural network weights (Bolt, pars 63-67).

Regarding claim 10, 
Jeong and Kimble teach claim 1.

However, Jeong and Kimble do not teach
the one or more constraints are hard constraint.

Bolt teaches 
the one or more constraints are hard constraint ([par 67] “Adding this `hard` constraint (`hard` in the sense that it must be satisfied by the trained network)”).

Bolt, pars 63-67).

Regarding claim 18, 
Claim 18 is a computer-readable media claim corresponding to the method claim 8, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 8.

Regarding claim 20, 
Claim 20 is a computer-readable media claim corresponding to the method claim 10, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 10.

Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong et al. (An Information Theoretic Approach to Adaptive System Training Using Unlabeled Data) in view of Kimble (US 2004/0034505 A1), further in view of Juang et al. (US 2018/0086222 A1).

Regarding claim 9, 
Jeong and Kimble teach claim 1.

However, Jeong and Kimble do not teach

using backpropagation to adjust the first set of weights of the trained neural network to produce the adjusted trained neural network further comprises: 
applying stochastic gradient descent to compute a gradient of the negative energy function to determine a set of changes to be applied to the first set of weights; and 
applying the set of changes to the first set of weights to produce the second set of weights.

Juang teaches 
the optimization problem comprises a negative energy function that is defined over a collection of output values ([par 43] “In general, the goal of the training process for a neural network such as those described above is setting the entries of the weight matrices and offset vectors in the formulas set forth above to produce an accurate mapping of inputs to outputs under a wide variety of operating conditions including different ambient temperatures and time-wise current profiles as a battery is discharged into a load. During a training process, training data for the network inputs to the input layer and output(s) of the output layer gathered from experimental measurements on one or more batteries of interest is utilized. The weight and offset parameters may be updated, for example, through backpropagation process using stochastic gradient descent as training data inputs are fed into the network, and the network outputs are compared to the measured outputs of the training data. The gradients for each parameter may be calculated based on a cost function of the squared error between the measured output value (measured voltage) and the forward predicted output value (estimated voltage) at every time step in the training data. The goal is to keep adjusting the parameters so that the cost function approaches zero”); and

([par 43] “In general, the goal of the training process for a neural network such as those described above is setting the entries of the weight matrices and offset vectors in the formulas set forth above to produce an accurate mapping of inputs to outputs under a wide variety of operating conditions including different ambient temperatures and time-wise current profiles as a battery is discharged into a load. During a training process, training data for the network inputs to the input layer and output(s) of the output layer gathered from experimental measurements on one or more batteries of interest is utilized. The weight and offset parameters may be updated, for example, through backpropagation process using stochastic gradient descent as training data inputs are fed into the network, and the network outputs are compared to the measured outputs of the training data. The gradients for each parameter may be calculated based on a cost function of the squared error between the measured output value (measured voltage) and the forward predicted output value (estimated voltage) at every time step in the training data. The goal is to keep adjusting the parameters so that the cost function approaches zero”; “gradients for each parameter” reads on “a set of changes”. Note that Jeong and Kimble teach “first set of weights”.);

	applying the set of changes to the first set of weights to produce the second set of weights ([par 43] “In general, the goal of the training process for a neural network such as those described above is setting the entries of the weight matrices and offset vectors in the formulas set forth above to produce an accurate mapping of inputs to outputs under a wide variety of operating conditions including different ambient temperatures and time-wise current profiles as a battery is discharged into a load. During a training process, training data for the network inputs to the input layer and output(s) of the output layer gathered from experimental measurements on one or more batteries of interest is utilized. The weight and offset parameters may be updated, for example, through backpropagation process using stochastic gradient descent as training data inputs are fed into the network, and the network outputs are compared to the measured outputs of the training data. The gradients for each parameter may be calculated based on a cost function of the squared error between the measured output value (measured voltage) and the forward predicted output value (estimated voltage) at every time step in the training data. The goal is to keep adjusting the parameters so that the cost function approaches zero”; “gradients for each parameter” reads on “a set of changes”. Note that Jeong and Kimble teach “first set of weights” and “second set of weights”.).

Jeong, Kimble and Juang are all in the same field of endeavor of processing input signal with the neural networking system and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural networking system of Jeong and Kimble with the stochastic gradient descent of Juang. Doing so would lead to calculating gradients for each parameter based on a cost function (Juang, pars 53-56).

Regarding claim 19, 
Claim 19 is a computer-readable media claim corresponding to the method claim 9, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 9.

Response to Arguments
Applicant’s arguments with respect to the independent claims have been considered but are moot because the arguments do not apply to any of the references being used in the current rejection.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409.  The examiner can normally be reached on Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.K./Examiner, Art Unit 2123         

/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126