DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed on 11/03/2021 have been fully considered but they are not persuasive.
In Remarks, pp. 12-13, Applicant contends: 
“Based on the rejection information at page 4, the Office Action appears to interpret the description in Jeong of continuous training of the machine learning model when the likelihood function is the same, and when it is different, between the training and testing datasets as showing "determining whether the first output violates one or more hard constraints on output values of the trained neural network" recited by Claim 1. However, there is no evidence in Jeong of determining whether the likelihood function changes between a testing and training dataset. As such, it is unclear how Jeong could be interpreted as showing "determining whether the first output violates one or more hard constraints on output values of the trained neural network" recited by Claim 1”

Examiner’s response:
The relevant claim limitation appears to be 
“determining whether the first output violates one or more hard constraints on output values of the trained neural network;”.

As noted in the rejections, Jeong teaches 
[figs 1-7]; [table 1]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; see also [sec III];

and, Bolt teaches
[par 67] “For example, when the value of the debt of a business decreases (and all of the other details remain unchanged), its credit score should increase. That is to say that the output of the neural network should be negatively monotonic with respect to changes in its value of debt input. Adding this hard constraint (hard in the sense that it must be satisfied by the trained network) also helps to guarantee that the ratings produced by the neural network satisfy basic properties that the credit analysts know should always apply”; 

In other words, Jeong teaches training a machine learning system (e.g., with neural network) (i.e. “the trained neural network”, cf. “neural network,”) with labeled dataset first and then with unlabeled dataset based on the Euclidean distance pdf matching algorithm. If an optimal decision boundary was found in the training phase, the likelihood functions do not change in the application phase with unlabeled dataset. But when the likelihood functions change (i.e. “determining whether the first output violates one or more [hard] constraints on output values”, cf. “In the case that the likelihood functions P(u | y = Ci) change from training to the testing set”. Note that the bracketed claim language, “[hard]”, indicates that it has not been taught by Jeong but it is taught by Bolt afterwards.), the optimal decision boundary is adjusted and the correct classification probability increases. 
In addition, Bolt teaches hard constraints on the output of the neural network (i.e. “one or more hard constraints on output values of the trained neural network”, cf. “hard constraint (hard in the sense that it must be satisfied by the trained network)”)

Therefore, the applicant’s arguments are not convincing.

In Remarks, p. 13, Applicant contends: 
“Furthermore, there is no evidence in Jeong of "customizing the weights of the trained neural network for the first output by using backpropagation to adjust the first set of weights of the trained neural network to produce an adjusted trained neural network comprising a second set of weights, wherein said customizing the weights lowers a probability associated with the first output", as recited by Claim 1.”

Examiner’s response:
The relevant claim limitation appears to be 
“customizing the weights of the trained neural network for the first output by using backpropagation to adjust the first set of weights of the trained neural network to produce an adjusted trained neural network comprising a second set of weights, 
wherein said customizing the weights lowers a probability associated with the first output”

As noted in the rejections, Jeong teaches 
[figs 1-7]; [table 1]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; [sec III] “Based on this assumption, the Euclidean distance pdf matching algorithm is proposed to adapt the system weights in application phase as 
    PNG
    media_image1.png
    76
    411
    media_image1.png
    Greyscale
 (3) where fdtrn(x) is the pdf of desired signal during training phase, fytst(x) is the pdf of system output signal during testing phase and w is the weight vector of the adaptive system. … In order to adjust the weights in the testing phase, we take the derivative of J with respect to w to obtain the gradient descent update”

and, Kimble teaches
[figs 7]; [pars 53-56] “An artificial neural network is conceptually a directed graph of artificial neurons (also called nodes) and connections, each connection running between two nodes in a single direction, from a source node to a target node. An adaptive weight is associated with each connection. The adaptive weight is a coefficient which is applied to the output of the Source node to produce a portion of the input to the target node. … Preferably, data glove support application 312 contains a neural network function, which uses stored parameters to construct and execute an artificial neural network. Specifically, the number of network nodes at each level, the connection weights for each connection, and the node output function, can all be stored as parameters of a particular neural network. These parameters are stored in profile 323.”; [pars 98-102] “The sensor data from keys which have not been eliminated as errors is then used as additional training data to update the neural network. I.e., the network is again trained with the backpropagation algorithm using the additional data (step 723). … Periodic re-training of the neural network by backpropagation with the additional training data helps to further refine the accuracy of the network”;

In other words, Jeong teaches training a machine learning system (e.g., with neural network) (i.e. “the trained neural network”, cf. “neural network,”) with labeled dataset in the training phase, and then adjusting the system weights with unlabeled dataset in the application phase based on the Euclidean distance pdf matching algorithm (i.e. “customizing the weights of the trained neural network for the first output by using [backpropagation] to adjust the first set of weights of the trained neural network to produce an adjusted trained neural network comprising a second set of weights”, cf. “adapt the system weights in application phase”; Note that the bracketed claim language, “[backpropagation]”, indicates that it has not been taught by Jeong but it is taught by Kimble afterwards.). When the likelihood functions change in the application phase, the optimal decision boundary is adjusted and the correct classification probability increases and incorrect classification probability decreases (i.e. “lowers a probability associated with the first output”, cf. “correct classification probability increases”).
In addition, Kimble teaches training again the neural network based on backpropagation (i.e. “backpropagation to adjust the first set of weights of the trained neural network”, cf. “Periodic re-training of the neural network by backpropagation with the additional training data helps to further refine the accuracy of the network”)

Therefore, the applicant’s arguments are not convincing.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8 and 11-18 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong et al. (An Information Theoretic Approach to Adaptive System Training Using Unlabeled Data) in view of Bolt (US 2005/0149463 A1) further in view of Kimble (US 2004/0034505 A1)

Regarding claim 1, 
Jeong teaches
A method comprising: 

(Jeong, [figs 1-2]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error.”; [sec V] “In the simulation, we used the same neural network topology for training and testing.”; see also [sec III]); 

after completing the training phase of the neural network: 
performing inference, using the trained neural network comprising a first set of weights, on a particular unlabeled input to produce a first output 
(Jeong, [figs 1-2]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec V] “In the simulation, we used the same neural network topology for training and testing.”; see also [sec III]; e.g., Outputs obtained from “novel unlabeled input data” may read on “first output”.);

(Note: Hereinafter, if a limitation has brackets (i.e. [ ]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)

determining whether the first output violates one or more [hard] constraints on output values of the trained neural network;
(Jeong, [figs 1-7]; [table 1]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; see also [sec III]; “the likelihood functions P(u | y = Ci) change from training to the testing set” and “the likelihood functions do not change from the training to the testing set” read on “determining whether the first output violates one or more constraints on output values of the trained neural network”.)

in response to determining that the first output violates the one or more [hard] constraints on output values of the trained neural network: 
customizing the weights of the trained neural network for the first output by using [backpropagation] to adjust the first set of weights of the trained neural network to produce an adjusted trained neural network comprising a second set of weights 
(Jeong, [figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] “Based on this assumption, the Euclidean distance pdf matching algorithm is proposed to adapt the system weights in application phase as 
    PNG
    media_image1.png
    76
    411
    media_image1.png
    Greyscale
 (3) where fdtrn(x) is the pdf of desired signal during training phase, fytst(x) is the pdf of system output signal during testing phase and w is the weight vector of the adaptive system. … In order to adjust the weights in the testing phase, we take the derivative of J with respect to w to obtain the gradient descent update”),

wherein said customizing the weights lowers a probability associated with the first output; 
(Jeong, [figs 1-7]; [table 1]; [sec II] as cited above; [sec IV] as cited above, and “Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases”; [sec III] “Based on this assumption, the Euclidean distance pdf matching algorithm is proposed to adapt the system weights in application phase as 
    PNG
    media_image1.png
    76
    411
    media_image1.png
    Greyscale
 (3) where fdtrn(x) is the pdf of desired signal during training phase, fytst(x) is the pdf of system output signal during testing phase and w is the weight vector of the adaptive system. … In order to adjust the weights in the testing phase, we take the derivative of J with respect to w to obtain the gradient descent update”; e.g., “correct classification probability increases” may read on “lowers a probability associated with the first output” since “correct classification probability increases” means that an incorrect classification probability decreases.)

wherein the first set of weights is different than the second set of weights 
(Jeong, [figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] as cited above; “adjust the weights in the testing phase” reads on “the first set of weights is different than the second set of weights”.);

performing inference, using the adjusted trained neural network, on the particular unlabeled input to produce a second output 
(Jeong, [figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec V] “In the first simulation, we artificially generated two classes with conditional pdf for class 1, P(u | C1), being a Gaussian distributed with zero mean and unit variance; the conditional pdf for class 2, P(u | C2), has mean vector [2, 2] and unit variance respectively. The two classes have equal a priori probabilities.”; the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “second output” since weights are adjusted over iterations.),

wherein the second output is different from the first output 
(Jeong, [figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec V] as cited above; The classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “the second output is different from the first output” since weights are adjusted over iterations.); and 

However, Jeong does not teach
determining whether the first output violates one or more [hard] constraints on output values of the trained neural network;
in response to determining that the first output violates the one or more [hard] constraints on output values of the trained neural network: 

wherein the method is performed by one or more computing devices.

Bolt teaches 
determining whether the first output violates one or more hard constraints on output values of the trained neural network; 
(Bolt, [par 67] “For example, when the value of the debt of a business decreases (and all of the other details remain unchanged), its credit score should increase. That is to say that the output of the neural network should be negatively monotonic with respect to changes in its value of debt input. Adding this hard constraint (hard in the sense that it must be satisfied by the trained network) also helps to guarantee that the ratings produced by the neural network satisfy basic properties that the credit analysts know should always apply”; Note that Jeong teaches “determining whether the first output violates one or more [hard] constraints on output values of the trained neural network”.).

in response to determining that the first output violates the one or more hard constraints on output values of the trained neural network: 
(Bolt, [par 67] “For example, when the value of the debt of a business decreases (and all of the other details remain unchanged), its credit score should increase. That is to say that the output of the neural network should be negatively monotonic with respect to changes in its value of debt input. Adding this hard constraint (hard in the sense that it must be satisfied by the trained network) also helps to guarantee that the ratings produced by the neural network satisfy basic properties that the credit analysts know should always apply”; Note that Jeong teaches “in response to determining that the first output violates the one or more [hard] constraints on output values of the trained neural network”.)

Jeong and Bolt are all in the same field of endeavor of processing input signal with the neural networking system and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural networking system of Jeong with the hard constraint of Bolt. Doing so would lead to helping to guarantee that outputs of the neural network are always satisfied by the trained network.
(Bolt, [pars 63-67], Adding this hard constraint (hard in the sense that it must be satisfied by the trained network) also helps to guarantee that the ratings produced by the neural network satisfy basic properties that the credit analysts know should always apply).

However, Jeong and Bolt do not teach
customizing the weights of the trained neural network for the first output by using [backpropagation] to adjust the first set of weights of the trained neural network to produce an adjusted trained neural network comprising a second set of weights;
wherein the method is performed by one or more computing devices.

Kimble teaches
customizing the weights of the trained neural network for the first output by using backpropagation to adjust the first set of weights of the trained neural network to produce an adjusted trained neural network comprising a second set of weights; 
(Kimble, [figs 7]; [pars 53-56] “An artificial neural network is conceptually a directed graph of artificial neurons (also called nodes) and connections, each connection running between two nodes in a single direction, from a source node to a target node. An adaptive weight is associated with each connection. The adaptive weight is a coefficient which is applied to the output of the Source node to produce a portion of the input to the target node. … Preferably, data glove support application 312 contains a neural network function, which uses stored parameters to construct and execute an artificial neural network. Specifically, the number of network nodes at each level, the connection weights for each connection, and the node output function, can all be stored as parameters of a particular neural network. These parameters are stored in profile 323.”; [pars 98-102] “The sensor data from keys which have not been eliminated as errors is then used as additional training data to update the neural network. I.e., the network is again trained with the backpropagation algorithm using the additional data (step 723). … Periodic re-training of the neural network by backpropagation with the additional training data helps to further refine the accuracy of the network”; e.g. “backpropagation algorithm” may read on “backpropagation, based on an optimization problem”. For more details, see Rumelhart et al. (Learning representations by back-propagating errors). Note that Jeong teaches “customizing the weights of the trained neural network for the first output by using [backpropagation] to adjust the first set of weights of the trained neural network to produce an adjusted trained neural network comprising a second set of weights”.);

wherein the method is performed by one or more computing devices (fig 2-3).

Jeong, Bolt and Kimble are all in the same field of endeavor of processing input signal with the neural networking system and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural networking system of Jeong and Bolt with the backpropagation of Kimble. Doing so would lead to enabling re-training of the neural network by backpropagation to refine the accuracy of the network (Kimble, pars 53-56).

Regarding claim 2, 
Jeong, Bolt and Kimble teach claim 1.
Jeong further teaches 
using [backpropagation] to adjust the first set of weights of the trained neural network to produce the adjusted trained neural network is performed during a testing phase of the neural network ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited for claim 1; [sec III] “In order to adjust the weights in the testing phase, we take the derivative of J with respect to w to obtain the gradient descent update”),

Kimble further teaches
using backpropagation to adjust the first set of weights of the trained neural network to produce the adjusted trained neural network ([figs 7]; [pars 53-56] “An artificial neural network is conceptually a directed graph of artificial neurons (also called nodes) and connections, each connection running between two nodes in a single direction, from a source node to a target node. An adaptive weight is associated with each connection. The adaptive weight is a coefficient which is applied to the output of the Source node to produce a portion of the input to the target node. … Preferably, data glove support application 312 contains a neural network function, which uses stored parameters to construct and execute an artificial neural network. Specifically, the number of network nodes at each level, the connection weights for each connection, and the node output function, can all be stored as parameters of a particular neural network. These parameters are stored in profile 323.”; [pars 98-102] “The sensor data from keys which have not been eliminated as errors is then used as additional training data to update the neural network. I.e., the network is again trained with the backpropagation algorithm using the additional data (step 723). … Periodic re-training of the neural network by backpropagation with the additional training data helps to further refine the accuracy of the network”).

Jeong, Bolt and Kimble are combinable with Kimble for the same rationale as set forth above with respect to claim 1.

Regarding claim 3, 
Jeong, Bolt and Kimble teach claim 1.
Jeong further teaches 
after performing inference, using the adjusted trained neural network, on the particular unlabeled input to produce the second output: 
determining whether the second output violates the one or more hard constraints on output values of the trained neural network ([figs 1-7]; [table 1]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; see also [sec III]; “the likelihood functions P(u | y = Ci) change from training to the testing set” and “the likelihood functions do not change from the training to the testing set” read on “determining whether the second output violates the one or more … constraints on output values of the trained neural network”. In addition, the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “second output” since weights are adjusted over iterations. Note that teaches Bolt teaches “hard constraints”.);

in response to determining that the second output violates the one or more hard constraints on output values of the trained neural network, using [backpropagation] to adjust the 31second set of weights of the trained neural network to produce a second adjusted trained neural network comprising a third set of weights ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] “In order to adjust the weights in the testing phase, we take the derivative of J with respect to w to obtain the gradient descent update”; “the likelihood functions P(u | y = Ci) change from training to the testing set” reads on “determining that the second output violates the one or more … constraints on output values of the trained neural network”. In addition, the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “second output” since weights are adjusted over iterations. Note that teaches Bolt teaches “hard constraints”.),

wherein the third set of weights is different than both the first set of weights and second set of weights ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] as cited above; “adjust the weights in the testing phase”, the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “the third set of weights is different than both the first set of weights and second set of weights”.);

Kimble further teaches
using backpropagation to adjust the second 31[set of weights of the trained neural network to produce a second adjusted trained neural network comprising a third set of weights ([figs 7]; [pars 53-56] “An artificial neural network is conceptually a directed graph of artificial neurons (also called nodes) and connections, each connection running between two nodes in a single direction, from a source node to a target node. An adaptive weight is associated with each connection. The adaptive weight is a coefficient which is applied to the output of the Source node to produce a portion of the input to the target node. … Preferably, data glove support application 312 contains a neural network function, which uses stored parameters to construct and execute an artificial neural network. Specifically, the number of network nodes at each level, the connection weights for each connection, and the node output function, can all be stored as parameters of a particular neural network. These parameters are stored in profile 323.”; [pars 98-102] “The sensor data from keys which have not been eliminated as errors is then used as additional training data to update the neural network. I.e., the network is again trained with the backpropagation algorithm using the additional data (step 723). … Periodic re-training of the neural network by backpropagation with the additional training data helps to further refine the accuracy of the network”).

Jeong, Bolt and Kimble are combinable with Kimble for the same rationale as set forth above with respect to claim 1.

Regarding claim 4, 
Jeong, Bolt and Kimble teach claim 1.
Jeong further teaches 
(see the rejections of claim 3);

in response to determining that the second output does not violate the one or more hard constraints on output values of the trained neural network, storing the second output as an inference result of the particular unlabeled input 
(Jeong, [figs 1-7]; [table 1]; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; see also [sec III]; [sec V] “first simulation” and “second simulation”; “the likelihood functions P(u | y = Ci) change from training to the testing set” and “the likelihood functions do not change from the training to the testing set” read on “determining that the second output does not violate the one or more … constraints on output values of the trained neural network”. In addition, the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “second output” since weights are adjusted over iterations. Note that teaches Bolt teaches “hard constraints”.);

Regarding claim 5, 
Jeong, Bolt and Kimble teach claim 1.

Jeong further teaches
the trained neural network, produced by said training the neural network, comprises an original set of weights 
(Jeong, [figs 1-2]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error.”; [sec V] “In the simulation, we used the same neural network topology for training and testing.”; see also [sec III]; “In the training phase, the adaptive system weights are adjusted” reads on “original set of weights” since a set of weights is given after the training phase.);

the method further comprises:
performing inference, using the trained neural network comprising the original set of weights, on a second unlabeled input to produce a third output ([figs 1-2]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec V] “In the simulation, we used the same neural network topology for training and testing.”; see also [sec III]; “novel unlabeled input data {uN+1, …, uK}” reads on “second unlabeled input”. In addition, eq (2) with “{uN+1, …, uK}” reads on “third output”.);

([figs 1-7]; [table 1]; [sec II] “y=g(u,w) (2) The adaptive system could be a linear filter (y = wTu), a neural network … In the training phase, the adaptive system weights are adjusted to obtain an approximation to f or classify the input data into different categories by minimizing the error. In the application (testing) phase, the weights are fixed and the trained system is tested on novel unlabeled input data {uN+1, …, uK}”; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; see also [sec III]; “the likelihood functions P(u | y = Ci) change from training to the testing set” and “the likelihood functions do not change from the training to the testing set” read on “determining whether the third output violates the one or more … constraints on output values of the trained neural network”. In addition, eq (2) with “{uN+1, …, uK}” reads on “third output”. Note that teaches Bolt teaches “hard constraints”.);

in response to determining that the third output violates the one or more hard constraints on output values of the trained neural network: 
([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] “In order to adjust the weights in the testing phase, we take the derivative of J with respect to w to obtain the gradient descent update”; “the likelihood functions P(u | y = Ci) change from training to the testing set” reads on “determining that the third output violates the one or more … constraints on output values of the trained neural network”. In addition, eq (2) with “{uN+1, …, uK}” reads on “third output”. Note that teaches Bolt teaches “hard constraints”.);

wherein the third set of weights is different than both the original set of weights and the second set of weights ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec III] as cited above; “adjust the weights in the testing phase”, the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “the third set of weights is different than both the original set of weights and the second set of weights”.);

performing inference, using the second adjusted trained neural network, on the second unlabeled input to produce a fourth output ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec V] “In the first simulation, we artificially generated two classes with conditional pdf for class 1, P(u | C1), being a Gaussian distributed with zero mean and unit variance; the conditional pdf for class 2, P(u | C2), has mean vector [2, 2] and unit variance respectively. The two classes have equal a priori probabilities.”; the classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “fourth output” since weights are adjusted over iterations.),

wherein the fourth output is different from the third output ([figs 1-7]; [table 1]; [sec II] and [sec IV] as cited above; [sec V] as cited above; The classification probability curve over iterations of fig 3 and the decision boundary changes of figs 1-2 and figs 4-7 read on “the fourth output is different from the third output” since weights are adjusted over iterations.);

Kimble further teaches
using backpropagation to adjust the original set of weights of the trained neural network to produce a second adjusted trained neural network comprising a third set of weights 
(Kimble, [figs 7]; [pars 53-56] “An artificial neural network is conceptually a directed graph of artificial neurons (also called nodes) and connections, each connection running between two nodes in a single direction, from a source node to a target node. An adaptive weight is associated with each connection. The adaptive weight is a coefficient which is applied to the output of the Source node to produce a portion of the input to the target node. … Preferably, data glove support application 312 contains a neural network function, which uses stored parameters to construct and execute an artificial neural network. Specifically, the number of network nodes at each level, the connection weights for each connection, and the node output function, can all be stored as parameters of a particular neural network. These parameters are stored in profile 323.”; [pars 98-102] “The sensor data from keys which have not been eliminated as errors is then used as additional training data to update the neural network. I.e., the network is again trained with the backpropagation algorithm using the additional data (step 723). … Periodic re-training of the neural network by backpropagation with the additional training data helps to further refine the accuracy of the network”; “backpropagation algorithm” reads on “backpropagation, based on an optimization problem”. For more details, see Rumelhart et al. (Learning representations by back-propagating errors). Note that Jeong teaches “using [backpropagation], based on the optimization problem, to adjust the original set of weights of the trained neural network to produce a second adjusted trained neural network comprising a third set of weights”.).



Regarding claim 6, 
Jeong, Bolt and Kimble teach claim 1.

Jeong further teaches 
determining whether the first output violates the one or more hard constraints on output values of the trained neural network is based on a loss function that encodes the one or more hard constraints ([figs 1-7]; [table 1]; [sec IV] “Since we use the probability of output data for training as a pseudo desired signal in testing phase in order to continue updating the classifier weights, we have to assume that priors do not change from the training to the testing set, i.e. P(Ci; training) = P(Ci; testing) for i = 1, 2. In the case that the likelihood functions P(u | y = Ci) change from training to the testing set, obviously the optimal decision boundary obtained from training set will not be the optimal for testing set. Under this condition, our algorithm will adjust the decision boundary to a better position such that the correct classification probability increases. In the case that the likelihood functions do not change from the training to the testing set, then according to the Bayesian equation (9) the optimal decision boundary obtained from training phase will remain optimal, since the value of the a posterior probability remains the same. Then there is no need to apply our algorithm.”; [sec III] “Based on this assumption, the Euclidean distance pdf matching algorithm is proposed to adapt the system weights in application phase as 
    PNG
    media_image1.png
    76
    411
    media_image1.png
    Greyscale
 (3) where fdtrn(x) is the pdf of desired signal during training phase, fytst(x) is the pdf of system output signal during testing phase and w is the weight vector of the adaptive system.”; “the likelihood functions P(u | y = Ci) change from training to the testing set” and “the likelihood functions do not change from the training to the testing set” read on “determining whether the first output violates one or more … constraints on output values of the trained neural network”. In addition, eq (3) reads on “loss function that encodes the one or more constraints”. Note that teaches Bolt teaches “hard constraints”.)

Regarding claim 7, 
Jeong, Bolt and Kimble teach claim 6.

Jeong further teaches 
receiving a definition of the one or more hard constraints ([figs 1-7]; [table 1]; [sec III] “Our idea is to combine the unlabeled input data in the testing set and the information of the desired output for training in order to continue adjusting the system weights. We assume here that the a priori probability of each class during testing is the same as for training. We will discuss more details about this hypothesis in the next section. Based on this assumption, the Euclidean distance pdf matching algorithm is proposed to adapt the system weights in application phase as 
    PNG
    media_image1.png
    76
    411
    media_image1.png
    Greyscale
 (3) where fdtrn(x) is the pdf of desired signal during training phase, fytst(x) is the pdf of system output signal during testing phase and w is the weight vector of the adaptive system.”; see also [sec IV], Note that teaches Bolt teaches “hard constraints”.) 

automatically formulating the loss function based on the definition of the one or more hard constraints ([figs 1-7]; [table 1]; [sec III] as cited above, and “We propose to compute the Euclidean distance pdf matching cost function directly from data samples, i.e. nonparametrically. This requires a smooth (i.e., continuous and differentiable) estimator for the two probability density functions fdtrn(x) and fytst(x). … A batch method is used here to compute the weights update. An online approach is also possible with the introduction of stochastic information gradient [13].”; see also [sec IV], Note that teaches Bolt teaches “hard constraints”.)

Regarding claim 8, 
Jeong, Bolt and Kimble teach claim 7.

Bolt further teaches 
the one or more hard constraints is defined using one or more of: 
context-free language; or 
regular language 
(Bolt, [par 67] “For example, when the value of the debt of a business decreases (and all of the other details remain unchanged), its credit score should increase. That is to say that the output of the neural network should be negatively monotonic with respect to changes in its value of debt input. Adding this hard constraint (hard in the sense that it must be satisfied by the trained network) also helps to guarantee that the ratings produced by the neural network satisfy basic properties that the credit analysts know should always apply”; Table 1 and pars 63-67 discuss the constraints being provided as regular language by way of strings);

Jeong, Bolt and Kimble are combinable with Bolt for the same rationale as set forth above with respect to claim 1.

Regarding claim 11, 
Claim 11 is a computer-readable media claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given ([figs 2-3]).

Regarding claim 12, 
Claim 12 is a computer-readable media claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 2.

Regarding claim 13, 
Claim 13 is a computer-readable media claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 3.

Regarding claim 14, 
Claim 14 is a computer-readable media claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 4.

Regarding claim 15, 
Claim 15 is a computer-readable media claim corresponding to the method claim 5, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 5.

Regarding claim 16, 


Regarding claim 17, 
Claim 17 is a computer-readable media claim corresponding to the method claim 7, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 7.

Regarding claim 18, 
Claim 18 is a computer-readable media claim corresponding to the method claim 8, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejection of claim 8.

Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong et al. (An Information Theoretic Approach to Adaptive System Training Using Unlabeled Data) in view of Bolt (US 2005/0149463 A1) further in view of Kimble (US 2004/0034505 A1) further in view of Juang et al. (US 2018/0086222 A1)

Regarding claim 9, 
Jeong, Bolt and Kimble teach claim 1.

using backpropagation to adjust the first set of weights of the trained neural network to produce the adjusted trained neural network further comprises: (see the rejection of claim 1)

However, Jeong, Bolt and Kimble do not teach

applying the set of changes to the first set of weights to produce the second set of weights.

Juang teaches 
applying stochastic gradient descent to compute a gradient of a negative energy function to determine a set of changes to be applied to the first set of weights 
(Juang, [par 43] “In general, the goal of the training process for a neural network such as those described above is setting the entries of the weight matrices and offset vectors in the formulas set forth above to produce an accurate mapping of inputs to outputs under a wide variety of operating conditions including different ambient temperatures and time-wise current profiles as a battery is discharged into a load. During a training process, training data for the network inputs to the input layer and output(s) of the output layer gathered from experimental measurements on one or more batteries of interest is utilized. The weight and offset parameters may be updated, for example, through backpropagation process using stochastic gradient descent as training data inputs are fed into the network, and the network outputs are compared to the measured outputs of the training data. The gradients for each parameter may be calculated based on a cost function of the squared error between the measured output value (measured voltage) and the forward predicted output value (estimated voltage) at every time step in the training data. The goal is to keep adjusting the parameters so that the cost function approaches zero”; “gradients for each parameter” reads on “a set of changes”. Note that Jeong, Bolt and Kimble teach “first set of weights”.);

	applying the set of changes to the first set of weights to produce the second set of weights 
(Juang, [par 43] “In general, the goal of the training process for a neural network such as those described above is setting the entries of the weight matrices and offset vectors in the formulas set forth above to produce an accurate mapping of inputs to outputs under a wide variety of operating conditions including different ambient temperatures and time-wise current profiles as a battery is discharged into a load. During a training process, training data for the network inputs to the input layer and output(s) of the output layer gathered from experimental measurements on one or more batteries of interest is utilized. The weight and offset parameters may be updated, for example, through backpropagation process using stochastic gradient descent as training data inputs are fed into the network, and the network outputs are compared to the measured outputs of the training data. The gradients for each parameter may be calculated based on a cost function of the squared error between the measured output value (measured voltage) and the forward predicted output value (estimated voltage) at every time step in the training data. The goal is to keep adjusting the parameters so that the cost function approaches zero”; “gradients for each parameter” reads on “a set of changes”. Note that Jeong, Bolt and Kimble teach “first set of weights” and “second set of weights”.).

Jeong, Bolt, Kimble and Juang are all in the same field of endeavor of processing input signal with the neural networking system and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural networking system of Jeong, Bolt and Kimble with the stochastic gradient descent of Juang. Doing so would lead to calculating gradients for each parameter based on a cost function (Juang, pars 53-56).

Regarding claim 19, 
.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Jeong et al. (A new classifier based on information theoretic learning with unlabeled data) teaches Information Theoretic learning.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409.  The examiner can normally be reached on Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.K./Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129