PNG
    media_image1.png
    340
    340
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 15/914,222
Filing Date: 7 Mar 2018
Appellant(s): International Business Machines Corporation



Wayne P. Bailey
For Appellant


EXAMINER’S ANSWER





This is in response to the appeal brief filed March 25, 2022, appealing the Non-Final Action mailed March 9, 2022 .


(1) Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated March 9, 2022 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”
(2) Response to Argument
Appellant argues: (see p. 10 top, claims 1, 11 and 16): “…However, Rastogi does not describe that its 'structural component' ('object') itself sends the 'current measurement' ('sensor data') - as would be required per the claimed 'sending' aspects of Claim 1. Instead, Rastogi describes that its 'measurement system' sends the 'current measurement' ('sensor data') per Rastogi paragraph [0018], lines 9-15 - and the Rastogi 'structural component' and 'measurement system' are separate/distinct components that are not equivalent to one another, as clearly depicted by Rastogi Figure 1, element 102 ('structural component') and element 106 ('measurement system'). Thus, it is urged that Claim 1 has been erroneously rejected due to such 'object sending' – based prima facie obviousness deficiencies since the Rastogi 'structural component' (alleged as being equivalent to the claimed 'object' sending the sensor data) does not itself send the 'current measurement' (alleged as being equivalent to the claimed 'sensor data').”


    PNG
    media_image2.png
    208
    597
    media_image2.png
    Greyscale
Examiner answers: See Fig. 1, 106 ('measurement system') receives the current state of 102 ('structural component') of an aircraft and provides data to 108 (‘PHM system’) for uncertainty analysis.
Rastogi further teaches in (Rastogi, [0016] “… a measurement system 106 configured to measure, sense, or otherwise quantify a size of a physical defect in the structural component 102 and provide measurement data indicative of the current size of the physical defect to a prognostics health management (PHM) system 108.”) Because the sensor data that 106 sending out to 108 are the current size of defect of 102, the data 106 provided can represent 102 status. Therefore, 102 current status sending from structural component 102 to 106 is equivalent to the sensor data sending from 106 to 108.

Appellant argues: (see p. 10 bottom, claims 1, 11 and 16): “… Notably, Rastogi does not describe that the 'maintenance' it performs is based on an impact of simulating this 'aleatoric uncertainty module'. Instead, the 'aleatoric uncertainty module' itself performs a 'calculation' - actual 'aleatoric uncertainty' is not simulated… Notably, does not describe that the 'maintenance' it performs is based on an impact of simulating this 'epistemic uncertainty module'. Instead, the 'epistemic uncertainty module' itself performs a 'calculation' - actual 'epistemic uncertainty' is not simulated.” 

Examiner answers: Based on Rastogi [0027] and [0028], the uncertainty modeling application 220 includes ‘simulating’ both aleatoric and epistemic uncertainty, and it determines RUL (remaining usage life) output value. Further in the abstract, the maintenance is scheduled/performed based on the RUL, therefore because RUL is determined by modeling or ‘simulating’ aleatoric and epistemic uncertainty in [0027] and [0028], the maintenance is performed based on ‘simulating’ aleatoric and epistemic uncertainty.

Appellant argues: (see p. 11 bottom, claims 1, 11 and 16): “…Goel describes… Instead, 'inputs and of outputs' a node/layer are dropped out - with no mention of (1) any epistemic uncertainty simulation or (2) dropping out any actual 'nodes' themselves, as claimed.” 

Examiner answers: (1) The examiner indicates in the Office Action on page 6, lines 12-13, Kendall is used to teach epistemic uncertainty simulation. Specifically, Kendall teaches the epistemic uncertainty and dropping out, Kendall doesn’t have the details of dropping out, so Goel is used to teach dropping out a node from each layer. The limitation “simulating… epistemic uncertainty…” is included here to show the connection between Kendall and Goel. (2) Dropout may not necessarily include deleting neurons / nodes. The BRI of dropout include deactivating those neurons temporarily and randomly by zeroing those multiplication operations of the inputs or outputs of the neurons. If the input or output of a neuron is multiplied to zero, the neuron/node is dropped out. Further, Goel also teaches a node is dropped out in [0020] (“The term ‘dropout’ refers herein to dropping out (or adding) nodes/neurons (or other input or output data). In one embodiment, dropout training includes temporarily removing (or adding) one or more nodes/neurons, and temporarily removing (or adding) all incoming and outgoing connections to the removed (or added) nodes/neurons.”). Therefore, dropping out a node is equivalent to excluding all of its incident edges from consideration, and Goel teaches the claimed invention.   

Appellant argues: (see p. 12 top, claims 1, 11 and 16): “Failure to Comply with US Supreme Court Precedent#1…  Hernandez-Lobatoc and Goel combination does not process any type of 'crack growth progression' - and therefore would have no need to make any type of 'crack growth progression' determination. Failure to Comply with US Supreme Court Precedent#2 Claim I has thus been erroneously rejected due to the Examiner's failure to comply with the above described US Supreme Court mandate to resolve the level of ordinary skill in the pertinent art for which, against this background, the obviousness or non-obviousness of the subject matter is then determined.” 

Examiner answers: For precedent #1, as cited in the Office Action, (Rastogi, [0027] "the uncertainty modeling application 220 includes an aleatoric uncertainty module 222 that calculates or otherwise determines a probabilistic representation of the crack growth progression…”; also see [0028]), Rastogi teaches using the uncertainty modeling to predict the crack growth progression and remaining usage life. Other prior arts such as Kendall teaches using neural network for uncertainty modeling, or Goel teaches dropout training for neural network. Therefore, by applying the uncertainly model using neural network of Kendall and dropout training of neural networks of Goel to the real-world sensor data/measurement of Rastogi, it will improve the uncertainty modeling for processing those sensor data and, as a result, provide better prediction of crack growth progression and remaining usage life. Further, for precedent #2,  the level of ordinary skill for the claimed feature is that of one skilled in the art of uncertainty modeling.

Appellant argues: (see p. 14 bottom, claims 3, 13 and 18): “Failure to Comply with US Supreme Court Precedent#1…  as the Examiner provides no reasons at all (see, e.g., page 10, lines 1-6 of the current Office Action).” 

Examiner answers: Kendall is used to teach claim 3, and Kendall is the primary reference, therefore a rationale is not needed here.

Appellant argues: (see p. 16 top, claims 4, 14 and 19): “… Failure to Comply with US Supreme Court Precedent#1… as the Examiner provides no reasons at all” 

Examiner answers: The rationale for combining the teachings of Kendall and Hernandez-Lobatoc is the same as set forth in the rejection of claim 1. These features of Hernandez-Lobatoc have already been incorporated into the combination as described in the rejection of Claim 1 - the rejection does not suggest a further modification of the invention, but only demonstrates how the rationale for the combination of claim 1 as set forth covers the limitation in the dependent claims. Therefore, the rationale for combining Kendall and Hernandez-Lobatoc is the same as set forth in the rejection of Claim 1.

Appellant argues: (see p. 17 bottom, claims 7 and 22): “…  Rastogi does not describe that its 'current measurement' ('sensor data') of a physical defect is received from its 'structural component' ('object'), as would be required per the features of Claim 7 ("receiving, by the computer, the sensor data from the object"). Instead, Rastogi describes that its 'current measurement' ('sensor data') is received from its 'measurement system' (Rastogi paragraph [0018], lines 9-15) - and the Rastogi 'structural component' and 'measurement system' are separate/distinct components that are not equivalent to one another, as clearly depicted by Rastogi Figure 1, element 102 ('structural component') and element 106 ('measurement system'). Thus, it is urged that Claim 7 has been erroneously rejected due to such receiving-based prima facie obviousness deficiencies.” 


    PNG
    media_image2.png
    208
    597
    media_image2.png
    Greyscale
Examiner answers: See Fig. 1, 106 ('measurement system') receives the current state of 102 ('structural component') of an aircraft and provides data to 108 (‘PHM system’) for uncertainty analysis. 
Rastogi further teaches in (Rastogi, [0016] “… a measurement system 106 configured to measure, sense, or otherwise quantify a size of a physical defect in the structural component 102 and provide measurement data indicative of the current size of the physical defect to a prognostics health management (PHM) system 108.”) Because the sensor data that 106 sending out to 108 are the current size of defect of 102, the data 106 provided can represent 102 status. Therefore, 102 current status sending from structural component 102 to 106 is equivalent to the sensor data sending from 106 sends to 108.

Appellant argues: (see p. 18 middle, claims 7 and 22): “… First, in rejecting the first step/action of Claim 7, the Examiner alleges that Rastogi's 'current measurement' of a physical defect is equivalent to the claimed 'sensor data'. Then, in rejecting the second step/action of Claim 7, the Examiner alleges that Rastigo's 'degradation development data' is equivalent to the claimed 'sensor data'. Such inconsistent interpretation of what is alleged to teach the claimed 'sensor data' is clear error.” 

Examiner answers: 
    PNG
    media_image2.png
    208
    597
    media_image2.png
    Greyscale
See Fig. 1, 122 ('uncertainty module') receives the degradation development data from 120 (‘degradation development module') and the degradation development data is based on the current measurement of the defect. 

Rastogi  teaches (Rastogi, [0020] "... an uncertainty analysis module 122 that receives the degradation development data from the degradation development module 120 and calculates or otherwise determines a probabilistic representation of the degradation development data [an intensity level of the sensor data] using the maintenance threshold"; [0005] "The method involves determining degradation development data for the physical defect based at least in part on a current measurement of the physical defect") Because the degradation development data is based on the current measurement of the defect, i.e. based on the sensor data, therefore using and analyzing the degradation development data is equivalent to using the measurement/sensor data.

Appellant argues: (see p. 18 bottom, claims 7 and 22): “…  Instead, the Rastigo 'degradation development data' is received from the 'degradation development module' (Rastigo paragraph [0020], lines 1-3). Thus, it is urged that Claim 7 has been erroneously rejected due to such determining based prima facie obviousness deficiencies” 


    PNG
    media_image2.png
    208
    597
    media_image2.png
    Greyscale
Examiner answers: See Fig. 1, 122 ('uncertainty module') receives the degradation development data from 120 (‘degradation development module') and the degradation development data is based on the current measurement of the defect. 


Rastogi  teaches (Rastogi, [0020] "... an uncertainty analysis module 122 that receives the degradation development data from the degradation development module 120 and calculates or otherwise determines a probabilistic representation of the degradation development data [an intensity level of the sensor data] using the maintenance threshold"; [0005] "The method involves determining degradation development data for the physical defect based at least in part on a current measurement of the physical defect") Because the degradation development data is based on the current measurement of the defect, i.e. based on the sensor data, therefore using and analyzing the degradation development data is equivalent to using the measurement/sensor data.

Appellant argues: (see p. 19 top, claims 7 and 22): “…  Failure to Comply with US Supreme Court Precedent…” 

Examiner answers: The Appellant does not provide any details for the rationale of the combination of prior art Rastogi being improper, therefore the rationale remains as set forth in the Office Action.

Appellant argues: (see p. 20 middle, claim 24): “(1) failure to comply with US Supreme Court
precedent#1, and (2) failure to comply with US Supreme Court precedent#2.” 

Examiner answers: As cited in the Office Action, (Rastogi, [0027] "the uncertainty modeling application 220 includes an aleatoric uncertainty module 222 that calculates or otherwise determines a probabilistic representation of the crack growth progression…”; also see [0028]), Rastogi teaches using the uncertainty modeling to predict the crack growth progression and remaining usage life. Other prior arts such as Kendall teaches using neural network for uncertainty modeling, or Goel teaches dropout training for neural network. Therefore, by applying the uncertainly model using neural network of Kendall and dropout training of neural networks of Goel to the real-world sensor data/measurement of Rastogi, it will improve the uncertainty modeling for processing those sensor data and, as a result, provide better prediction of crack growth progression and remaining usage life.

Appellant argues: (see p. 20 middle, claim 24): “… Here, aleatoric uncertainty is simulated to measure what the artificial neural network does not understand from sensor data received from an object. Restated, the 'does not understand' is with respect to the actual 'sensor data' received from the object- and not a 'does not understand' without any bounds/qualifications… Thus, it is urged that the combined teachings of the cited references do not describe that the 'does not understand' is with respect to 'sensor data' received from an 'object'.” 

Examiner answers: Kendall teaches (Kndl., p. 1 "… Aleatoric uncertainty captures noise inherent in the observations. This could be for example sensor noise or motion noise, resulting in uncertainty which cannot be reduced even if more data were to be collected.") Rastogi teaches (Rastogi, [0027] "the uncertainty modeling application 220 includes an aleatoric uncertainty module 222…”; [0005] "The method involves determining degradation development data for the physical defect based at least in part on a current measurement of the physical defect") Kendall teaches simulating aleatoric uncertainty from some observations [sensor data], but Kendall does not have the feature of ‘bounds’: ‘sensor data received from an object.’ Therefore, Rastogi is used here to teach the feature of ‘bounds.’ Because both Kendall and Rastogi teaches simulating aleatoric uncertainty, therefore they are combined here to teach the limitation.

Appellant argues: (see p. 21 middle, claim 24): “… such 'adding' is not performed to effectuate 'aleatoric uncertainty' simulation, as claimed. Instead, this alleged 'adding' step is performed 'to train an artificial network' (Hernandez-Lobatoc Section 3)… such 'measuring' is not performed to effectuate 'aleatoric uncertainty' simulation, as claimed. Instead, this alleged 'measuring' step is performed 'to train an artificial network' (Hernandez-Lobatoc Section 3). Thus, it is further urged that Claim 24 has been erroneously rejected due to such 'aleatoric uncertainty' simulation-based prima facie obviousness deficiencies that is effectuated by both an (1) 'adding' and (2) 'measuring' step/action.” 

Examiner answers: Hernandez-Lobatoc teaches adding weight adjustments and measuring impact in the Bayesian neural network, and Kendall teaches using a Bayesian neural network to simulate / learn the aleatoric uncertainty. Because both Hernandez-Lobatoc and Kendall teach Bayesian neural network, the combination makes the ‘adding’ and ‘measuring’ steps effectuate 'aleatoric uncertainty' simulation.

Appellant argues: (see p. 22 top, claim 24): “… Notably, this 'back propagation technique' is not described as adding 'random values' to 'edge weights between nodes', as claimed. Instead, weights are 'updated' with 'derivatives of the training loss'. To the extent that Hernandez mentions random weights in its alternative 'forward propagation' technique, there is no mention of a specific 'adding' step where 'random values' are added to edge weights during backpropagation. Thus, it is further urged that Claim 24 has been erroneously rejected due to such backpropagation-based prima facie obviousness deficiencies.” 

Examiner answers: (1) Back-propagation computes the gradient/derivatives of the loss function/model error/training loss with respect to the weights, and the derivatives (or a gradient) are edge weight adjustments. Further because the weights in the PBP are now random (Gaussian distribution is used to approximate the weight), the update rule in the PBP is accordingly to update the means and the variances of the distribution, i.e. this distribution represents ‘the weights between nodes’ being adjusted [added random values] by the derivatives [weight adjustments]. (2) In the Bayesian deep learning or Bayesian neural network, each weight of the network connections has its own mean and variance, and is described as a posterior distribution. Therefore, no matter in forward or backward propagation, the weights are random. Further, Hernandez-Lobatoc teaches ("In the second phase, the derivatives of the training loss with respect to the weights are propagated back from the output layer [during backpropagation of output data of the artificial neural network] towards the input. These derivatives are used to update the weights…”), because Hernandez-Lobatoc teaches ‘updating’ ‘weights’ in the ‘second phase’, it teaches ‘adding’ ‘random values w’ during ‘backpropagation.’ (Instead of just w, now single synaptic weights have mean and variance in a distribution, which are values describing the randomness of w.)

Appellant argues: (see p. 22 middle, claim 24): “… Since Hernandez-Lobatoc does not describe 'added random values' per a backpropagation technique, as shown above, it cannot describe measuring an impact on the 'output data' by (non-taught) 'added random values', as claimed. Thus, it is further urged that Claim 24 has been erroneously rejected due to such impact measuring-based prima facie obviousness deficiencies pertaining to 'added random values' and 'backpropagation'.” 

Examiner answers: In the Bayesian deep learning or Bayesian neural network, each weight of the network connections has its own mean and variance, and is described as a posterior distribution. Therefore, no matter in forward or backward propagation, the weights are random. Further, Hernandez-Lobatoc teaches ("In the second phase, the derivatives of the training loss with respect to the weights are propagated back from the output layer [during backpropagation of output data of the artificial neural network] towards the input. These derivatives are used to update the weights…”), because Hernandez-Lobatoc teaches ‘updating’ ‘weights’ in the ‘second phase’, it teaches ‘adding’ ‘random values w’ during ‘backpropagation.’ (Instead of just w, now single synaptic weights have mean and variance in a distribution, which are values describing the randomness of w.)

Appellant argues: (see p. 22 bottom, claim 25): “…  (1) failure to comply with US Supreme Court
precedent#1, and (6) failure to comply with US Supreme Court precedent#2.” 

Examiner answers: As cited in the Office Action, (Rastogi, [0027] "the uncertainty modeling application 220 includes an aleatoric uncertainty module 222 that calculates or otherwise determines a probabilistic representation of the crack growth progression…”; also see [0028]), Rastogi teaches using the uncertainty modeling to predict the crack growth progression and remaining usage life. Other prior arts such as Kendall teaches using neural network for uncertainty modeling, or Goel teaches dropout training for neural network. Therefore, by applying the uncertainly model using neural network of Kendall and dropout training of neural networks of Goel to the real-world sensor data/measurement of Rastogi, it will improve the uncertainty modeling for processing those sensor data and, as a result, provide better prediction of crack growth progression and remaining usage life.

Appellant argues: (see p. 23 top, claim 25): “… Notably, there is no mention of any 'simulating'3 of such epistemic uncertainty. Instead, 'epistemic uncertainty' per se is described without regards to any 'simulating' thereof, as claimed. Thus, it is urged that Claim 25 has been erroneously rejected due to such simulating based prima facie obviousness deficiencies” 

Examiner answers: Predicting epistemic uncertainty in the model is simulating epistemic uncertainty. Further in the preamble, Kendall teaches (Kndl., p. 1 "We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models [an artificial neural network] for vision tasks."; p. 3 "To capture epistemic uncertainty in a neural network (NN)… Such a model is referred to as a Bayesian neural network (BNN)..."), modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models is ‘simulating’ epistemic vs. aleatoric uncertainty.

Appellant argues: (see p. 23 bottom, claim 25): “… Instead, 'inputs and outputs' of a node/layer are dropped out - with no mention of (1) any 'epistemic uncertainty' simulation or (2) dropping out actual 'nodes' themselves, as claimed.” 

Examiner answers: (1) The examiner indicates in the Office Action on page 6, lines 12-13, Kendall is used to teach ‘epistemic uncertainty simulation.’ Specifically, Kendall teaches the epistemic uncertainty and dropping out, Kendall doesn’t have the details of dropping out a node, so Goel is used to teach dropping out a node from each layer. The limitation “simulating… epistemic uncertainty…” is included here to show the connection between Kendall and Goel. (2) Dropout may not necessarily include deleting neurons / nodes. The BRI of dropout include deactivating those neurons temporarily and randomly by zeroing those multiplication operations of the inputs or outputs of the neurons. If the input or output of a neuron is multiplied to zero, the neuron/node is dropped out. Further, Goel also teaches a node is dropped out in [0020] (“The term ‘dropout’ refers herein to dropping out (or adding) nodes/neurons (or other input or output data). In one embodiment, dropout training includes temporarily removing (or adding) one or more nodes/neurons, and temporarily removing (or adding) all incoming and outgoing connections to the removed (or added) nodes/neurons.”). Therefore, dropping out a node is equivalent to excluding all of its incident edges from consideration, and Goel teaches the claimed invention.

Appellant argues: (see p. 24 middle, claims 2, 12 and 17): “… Instead, 'inputs' and 'outputs' of a node - but not the 'node' itself- are dropped out by a fixed percentage amount. Thus, it is urged that Claim 2 has been erroneously rejected due to such node-based prima facie obviousness deficiencies.” 

Examiner answers: Dropout may not necessarily include deleting neurons / nodes. It deactivates those neurons temporarily and randomly by zeroing those multiplication operations of the inputs or outputs of the neurons. If the input or output of a neuron is multiplied to zero, the neuron is dropped out. Further, Goel also teaches a node is dropped out in [0020] (“The term ‘dropout’ refers herein to dropping out (or adding) nodes/neurons (or other input or output data). In one embodiment, dropout training includes temporarily removing (or adding) one or more nodes/neurons, and temporarily removing (or adding) all incoming and outgoing connections to the removed (or added) nodes/neurons.”). Therefore, Therefore, dropping out a node is equivalent to excluding all of its incident edges from consideration, and Goel teaches the claimed invention.

Appellant argues: (see p. 25 middle, claims 2, 12 and 17): “… Restated, the alleged teaching of 'using the sensor data' is not itself used as a part of the alleged 'selecting' step/action, as claimed. Thus, it is further urged that Claim 2 has been erroneously rejected due to such additional prima facie obviousness deficiencies.” 

Examiner answers: Rastogi teaches ‘using sensor data’ from the real-world environment and Goel teaches dropout training for neural network, i.e. ‘selecting’ a node to be dropped. Further the rationale for combining these references is: “by applying the uncertainly model using neural network of Kendall and dropout training of neural networks of Goel to the real-world sensor data/measurement of Rastogi, it will improve the uncertainty modeling for processing those sensor data.” Therefore, Goel and Rastogi are combined to teach the limitation.

Appellant argues: (see p. 25 bottom – p. 26 bottom, claims 2, 12 and 17): “…  Notably,
there is no mention of (1) selecting a particular 'node' itself to be randomly dropped from a
particular layer, (2) 'applying' an intensity level of the sensor data to a 'probability density
function', and (3) using an 'applying' an intensity level to a 'probability density function' as the
basis for selecting a particular node to be dropped out, as claimed. Instead, Goel describes that
hyperparameters can be reduced or increased.” 

Examiner answers: Dropout may not necessarily include deleting neurons / nodes. It deactivates those neurons temporarily and randomly by zeroing those multiplication operations of the inputs or outputs of the neurons. If the input or output of a neuron is multiplied to zero, the neuron is dropped out. Further, Goel also teaches a node is dropped out in [0020] (“The term ‘dropout’ refers herein to dropping out (or adding) nodes/neurons (or other input or output data). In one embodiment, dropout training includes temporarily removing (or adding) one or more nodes/neurons, and temporarily removing (or adding) all incoming and outgoing connections to the removed (or added) nodes/neurons.”). (1) Therefore, Therefore, dropping out a node is equivalent to excluding all of its incident edges from consideration, and Goel teaches the claimed invention. Further, Goel teaches the ‘apply’ feature in [0030] and [0067] (“the probability distribution over dropout rates… dropout rate parameter) that may optionally depend on the current state… of input”). (2) Because current state of input data is applied to dropout rate parameter, which is the parameter of a probability function of dropout rate, Goel teaches 'applying' an intensity level of the sensor data to a 'probability density function' (3) Because this probability function or the parameter is for adjusting the dropout, it is the basis for selecting a node to be dropped out.

Appellant argues: (see p. 26 bottom, claims 2, 12 and 17): “…  Failure to Comply with US Supreme Court Precedent#1… and therefore would have no need to make any type of
'crack growth progression' determination. Failure to Comply with US Supreme Court Precedent#2…” 

Examiner answers: As cited in the Office Action, (Rastogi, [0027] "the uncertainty modeling application 220 includes an aleatoric uncertainty module 222 that calculates or otherwise determines a probabilistic representation of the crack growth progression…”; also see [0028]), Rastogi teaches using the uncertainty modeling to predict the crack growth progression and remaining usage life. Other prior arts such as Kendall teaches using neural network for uncertainty modeling, or Goel teaches dropout training for neural network. Therefore, by applying the uncertainly model using neural network of Kendall and dropout training of neural networks of Goel to the real-world sensor data/measurement of Rastogi, it will improve the uncertainty modeling for processing those sensor data and, as a result, provide better prediction of crack growth progression and remaining usage life.

Appellant argues: (see p. 29 top, claims 5, 15 and 20): “…Most notably, Gal does not describe that 'the obtained output' is input into each different type of probability density function corresponding to each respective hidden layer in the plurality of hidden layers, as is provided by the plain meaning of Claim 5. Instead, Gal describes ‘the neural network's output on input xi’. This 'output' is not input into 'each different type of probability density function', as claimed. Thus, it is urged that Claim 5 has been erroneously rejected due to such output/input based prima facie obviousness deficiencies.” 

Examiner answers: Gal teaches this feature on p.3 (“The optimisation objective that follows from… fw(xi) the neural network’s output on input xi when evaluated with weight matrices realization” and “a set of random weight matrices w = {W_l} l = 1..L with L layers”)
When evaluating the weight w of each of L layers, fw(xi), the obtained output is used in the objection function for training, i.e. in neural network training process, fw(xi) the obtained output is inputted / provided to the neural network model for calculating the objection function and adjusting the weight w for each layer, which is a probability density function. Therefore, Gal teaches the claimed invention.

Appellant argues: (see p. 29 middle, claims 5, 15 and 20): “… Notably, there is no mention of (1) generating 'edge weight adjustments' between nodes, (2) 'probabilities' of occurrence of the obtained output, or (3) using 'probabilities' of occurrence of the obtained output as the basis for generating 'edge weight adjustments' between nodes. Instead, an 'optimization objective' is described for a neural network having 'a set of random weight matrices'.” 

Examiner answers: Gal teaches this feature on p.3 (“The optimisation objective that follows from… fw(xi) the neural network’s output on input xi when evaluated with weight matrices realization, and p(yi|fw(xi)) the model’s likelihood, e.g. a Gaussian with mean fw(xi).")
(1) A skilled person in the art would know weights are parameters between nodes and an objective function is used to generate weight adjustment during training. Gal teaches w are the weight parameters between nodes, L(theta) is the objective function, and fw(xi) is the output. Therefore, the objective function L(theta) is used to generate weight adjustments between nodes based on fw(xi). (2) p(yi|fw(xi)) is the probabilities of occurrence of the obtained output. (3) fw(xi) is used in the objective function, i.e. fw(xi) is inputted / provided to the neural network for calculating the objection function and generating weight adjustments between nodes. Gal teaches the claimed invention.

Appellant argues: (see p. 29 bottom, claims 5, 15 and 20): “… Failure to Comply with US Supreme Court Precedent” 

Examiner answers: The Appellant does not provide any details for the rationale of the combination of prior art Gal being improper, therefore the rationale remains as set forth in the Office Action.

Appellant argues: (see p. 31 bottom, claims 6 and 21): “… Notably, no operational steps are described in this section 2, so this cited passage does not teach the claimed 'backpropagating' step/action. In contrast, per Claim 6, 'the model error' that is determined based on a delta between a target output and the obtained output (per Claim 4) is what is backpropagated4. Thus, it is urged that Claim 6 has been erroneously rejected due to such 'model error' backpropagation-based prima facie obviousness deficiencies.” 

Examiner answers: Claim 6 does require any details of “a delta between a target output and the obtained output” and this feature is mentioned in claim 4, taught by Kendall. Instead, claim 6 only requires the model error is backpropagated, even though a skilled person would know this model error/training loss is based on a delta/difference between the target output and the model output.

Appellant argues: (see p. 32 top, claims 6 and 21): “…In contrast, per the features of Claim 6, edge weights between nodes are updated based on the level of contribution by each respective node to the 'model error' - and further evidencing that Claim 6 has been erroneously rejected due to such node/'model error' contribution-based prima facie obviousness deficiencies.” 

Examiner answers: Back-propagation computes the gradient/derivatives of the loss function/model error/training loss with respect to the weights, i.e. d(loss)/d(node weight) is the derivative, which is also known as a gradient, is a level of contribution by each respective node to the loss function / model error, and iterates backward from the last layer to the first layer, as taught by Hernandez-Lobatoc. 

Appellant argues: (see p. 32 middle, claims 6 and 21): “…  Notably, there is no mention of: (1) 'edge weight adjustments' are added to (2) 'the updated edge weights between nodes', as claimed. Instead, 'derivatives' are used to update (i) weights and (ii) means and variances of the posterior approximation. Thus, it is further shown that Claim 6 has been erroneously rejected due to such 'edge weight adjustments' added to 'updated edge weights' prima facie obviousness deficiencies.” 

Examiner answers: The derivatives (or a gradient) are edge weight adjustments, and because the weights in the PBP are now random (Gaussian distribution is used to approximate the weight), the update rule in the PBP is accordingly to update the means and the variances of the distribution, i.e. this distribution represents the weights between nodes being adjusted [added] by the derivatives [weight adjustments]. Therefore, Hernandez-Lobatoc teaches the claimed invention.

Appellant argues: (see p. 33 top, claims 6 and 21): “Failure to Comply with US Supreme Court Precedent…” 

Examiner answers: The Appellant does not provide any details for the rationale of the combination of prior art Hernandez-Lobatoc being improper, therefore the rationale remains as set forth in the Office Action.

Appellant argues: (see p. 35 top and middle, claims 8 and 23): “… Notably, there is no mention of processing associated with an intensity level of 'sensor data' received from an 'object', as claimed.” 

Examiner answers: The elements ‘sensor data’ and ‘object’ has been addressed in claim 1. The sensor data is the crack growth/size (see Rastogi, [0027]) and the object is the structural component exhibiting damage, fatigue, or other degradation (see Rastogi, [0027]). Nechval is an art related to fatigue crack growing process in an aircraft, therefore Nechval is combined and used to teach the new features (in claim 8) that has not been addressed in claim 1. (In Nechval, crack size and aircraft can also be viewed as ‘sensor data’ and ‘object’ respectively.)

Appellant argues: (see p. 35 middle, claims 8 and 23): “…In contrast, the claimed 'inputting' is performed 'responsive to' a particular condition (intensity level determination regarding 'sensor' data received from an 'object'). Thus, it is urged that Claim 8 has been erroneously rejected due to such 'responsive to' based prima facie obviousness deficiencies.” 

Examiner answers: The datasets of the crack size [sensor data] being used in the paper are greater than a specified threshold. Specifically, (1) in the three regions of Fig. 3, data are greater than Kth, the fatigue threshold. (2) Further in Fig. 7, a cracked component being inspected for extended service life of a cracked component is with a crack size greater than a°. (3) Further on p. 10 in Table 1, a(mm) is greater than 7.0mm (Nechval, p. 10 “The pre-made crack length was 7.0 mm. Crack growing length was monitored by microscope.”) In other words, only when the data is greater than the specified threshold [responsive to a certain condition], those data will be used in the artificial neural network (ANN) technique [inputting is performed].

Appellant argues: (see p. 36 top, claims 8 and 23): “… Notably, Kendall does not describe on page 3 making a determination as to which node in each respective hidden layer is to be randomly dropped out, as claimed. Instead, a model is trained 'with dropout'. As to the teachings of Goel, 'inputs and outputs' of a node are dropped - but not the 'node' itself. Thus, it is further urged that Claim 8 has been erroneously rejected due to such node based randomly dropped out based prima facie obviousness deficiencies.” 

Examiner answers: The examiner does not rely on Kandall to tach the feature. Goel teaches this feature. Dropout may not necessarily include deleting neurons / nodes. The BRI of dropout include deactivating those neurons temporarily and randomly by zeroing those multiplication operations of the inputs or outputs of the neurons. If the input or output of a neuron is multiplied to zero, the neuron is dropped out. Further, Goel also teaches a node is dropped out in [0020] (“The term ‘dropout’ refers herein to dropping out (or adding) nodes/neurons (or other input or output data). In one embodiment, dropout training includes temporarily removing (or adding) one or more nodes/neurons, and temporarily removing (or adding) all incoming and outgoing connections to the removed (or added) nodes/neurons.”). Therefore, dropping out a node is equivalent to excluding all of its incident edges from consideration, and Goel teaches the claimed invention.

Appellant argues: (see p. 36 top, claims 8 and 23): “…Kendal and Goel also do not describe 'simulating' an unknown event, and therefore Kendall/Goel do not describe dropping out a node 'to simulate an unknown event', as claimed Thus, it is further shown that Claim 8 has been erroneously rejected due to such 'unknown event simulation' based prima facie obviousness deficiencies.” 

Examiner answers: Kendall teaches this feature on p.2 as cited in the Office Action. (Kandall, p. 2 “our model exhibits increased aleatoric uncertainty on object boundaries and for objects far from the camera.") There are no details of the unknown event in the claim. Therefore, simulating uncertainty [unknown event simulation] on object boundaries and for distant objects is viewed as simulating the unknown event.

Appellant argues: (see p. 36 middle, claims 8 and 23): “Non-Analogous Art… The claimed invention is directed to a system for simulating 'uncertainty' in an artificial neural network. In contrast, the cited Nechval reference is directed to predicting a 'fatigue crack growth process' using an artificial neural network technique. Accordingly, the cited Nechval reference is not in the same field of endeavor as the claimed invention.” 

Examiner answers: Nechval is used to teach for data pre-processing before feeding it to a model, i.e. filtering out data not satisfying the threshold constraint and then providing filtered data to a neural network model. The feature of simulating uncertainty is not required here and not taught by Nechval, instead, it is taught by Kendall.

Appellant argues: (see p. 37 middle, claims 8 and 23): “Failure to Comply with US Supreme Court Precedent…” 

Examiner answers: The Appellant does not provide any details for the rationale of the combination of prior art Nechval being improper, therefore the rationale remains as set forth in the Office Action.

Appellant argues: (see p. 39 middle, claim 9): “… Thus, it is urged that Claim 9 has been erroneously rejected since Gal does not describe 'selecting' a hidden layer… Instead, Goel describes… Thus, it is urged that Claim 9 has been erroneously rejected due to such hidden layer selection-based prima facie obviousness deficiencies… Notably, there is no mention of a particular 'identifying' step/action with respect to a 'probability density function' corresponding to a particular selected 'hidden layer', as claimed. Instead, characteristics of 'dropout' are described. Thus, it is urged that Claim 9 has been erroneously rejected due to such 'identifying' based prima facie obviousness deficiencies.” 

Examiner answers: The examiner relies on Gal to teach the feature, not Goel. Gal teaches for each of weight layers (1..L layers), q(w) is identified by using KL term and q(w) is a probability density function for each hidden layer separately that models an output/posterior of the ANN/BNN. In other words, for every ‘selected’ hidden layer, there is a corresponding q(w) probability function ‘identified’ by KL term. 

Appellant argues: (see p. 40 middle, claim 9): “… Failure to Comply with US Supreme Court Precedent#1… as the Examiner provides no reasons at all… Failure to Comply with US Supreme Court Precedent#2…” 

Examiner answers: The rationale for combining the teachings of Kendall, Hernandez-Lobatoc, Goel, Rastogi and Gal is the same as set forth in the rejection of claim 2. These features of Gal have already been incorporated into the combination as described in the rejection of Claim 2 - the rejection does not suggest a further modification of the invention, but only demonstrates how the rationale for the combination of claim 2 as set forth covers the limitation in the dependent claims. Therefore, the rationale for combining Kendall, Hernandez-Lobatoc, Goel, Rastogi and Gal is the same as set forth in the rejection of Claim 2.

Appellant argues: (see p. 42 middle, claim 10): “… There is no mention of (1) a specific 'selecting' step/action, (2) an actual 'node' to be randomly dropped, or (3) 'selecting' an actual 'node' to be randomly dropped out. Instead, 'aggregation' is implemented. There is no mention of (4) an actual 'applying' step/action, (5) an 'intensity level' of 'sensor data' received from an 'object', (6) a specially identified 'probability density function', or (7) applying an 'intensity level' of the 'sensor data' received from the 'object' to a specially identified 'probability density function', as claimed. Instead, a 'probability distribution' is adjusted…” 

Examiner answers: Dropout may not necessarily include deleting neurons / nodes. It deactivates those neurons temporarily and randomly by zeroing those multiplication operations of the inputs or outputs of the neurons. If the input or output of a neuron is multiplied to zero, the neuron is dropped out. Further, Goel also teaches a node is dropped out in [0020] (“The term ‘dropout’ refers herein to dropping out (or adding) nodes/neurons (or other input or output data). In one embodiment, dropout training includes temporarily removing (or adding) one or more nodes/neurons, and temporarily removing (or adding) all incoming and outgoing connections to the removed (or added) nodes/neurons.”). Therefore, (1) if a node dropped out, this node is ‘selected’ to be dropped/removed. (2) dropping out nodes means at least one actual node is randomly dropped. (3) combining (1) and (2), an actual ‘node’ is ‘selected’ to be randomly dropped. 

Further, Goel teaches the ‘apply’ feature in [0030] and [0067] (“the probability distribution over dropout rates… dropout rate parameter) that may optionally depend on the current state… of input”). (4) Because dropout rate parameter depends on the current state of input, Goel teaches dropping out is based on ‘applying’ the current state [intensity level] of input data. (5) Goel teaches current state [intensity level] of input data and Rastogi teaches ‘sensor data’ received from an ‘object’ (6) because Goel teaches dropout rates is an adjustable probability distribution , therefore dropout rates is the 'probability density function', and (7) combining all features above, Goel and Rastogi teach “applying an 'intensity level' of the 'sensor data' received from the 'object' to a specially identified 'probability density function'.”

Appellant argues: (see p. 43 middle, claim 10): “… Kendall describes… Notably, there is no mention of (1) a specially selected 'hidden layer', or (2) a specially 'selected node' within the 'selected hidden layer', as claimed. Instead, a model is trained with 'dropout', and 'dropout' is performed as part of a test... Goel describes… Notably, there is no mention of (1) a specially selected 'hidden layer', or (2) a specially 'selected node' within the 'selected hidden layer', as claimed. Instead, 'inputs and outputs' of a layer or node are dropped out. Thus, it is further urged that Claim 10 has been erroneously rejected due to such selected node/selected hidden layer-based prima facie obviousness deficiencies.” 

Examiner answers: The examiner indicates in the Office Action that Kendall teaches: dropping out to simulate epistemic uncertainty; and Goel teaches: the selected node within the selected hidden layer. Therefore, for arguments of Kendall (1) and (2), the examiner does not rely on Kendall to teach the feature. Further, for arguments of Goel (1) and (2): Goel teaches (Goel [0006] "'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network... to tune network parameters (number of layers, number of nodes per layer”), because the parameters can be tuned for the number of layers or number of nodes per layer for dropping out, tuning number of layers can be viewed as ‘selecting’ a layer, and tuning number of nodes per layer can be viewed as ‘selected node in the selected layer.’ Further, Goel also teaches a node is dropped out in [0020] (“The term ‘dropout’ refers herein to dropping out (or adding) nodes/neurons (or other input or output data). In one embodiment, dropout training includes temporarily removing (or adding) one or more nodes/neurons, and temporarily removing (or adding) all incoming and outgoing connections to the removed (or added) nodes/neurons.”). Therefore, dropping out a node is equivalent to excluding all of its incident edges from consideration, and Goel teaches the claimed invention.

Appellant argues: (see p. 44 top, claim 10): “… The combined teachings of the cited references do not describe 'applying' an intensity level of sensor data to a specially identified 'probability density function'… Thus, it is urged that Claim 10 has been erroneously rejected due to such node selecting 'based on' prima facie obviousness deficiencies since Goel describes dropping out ‘inputs or outputs’ of a node based on a 'fixed percentage'.” 

Examiner answers: Goel teaches the ‘apply’ feature in [0030] and [0067] (“the probability distribution over dropout rates… dropout rate parameter) that may optionally depend on the current state… of input”). Because dropping rate is a probability function and its parameters are determined based on the current state of input, Goel teaches node selection for dropping out is ‘based on’ ‘applying’ the current state [intensity level] of input data ‘to’ the probability function of dropout rate.

Appellant argues: (see p. 44 top, claim 10): “Failure to Comply with US Supreme Court Precedent#1…Such motivation assertion does not meet the articulated reasoning with rational underpinning mandate since the combined teachings prior to the Rastogi addition do not describe any desire 'to determine crack growth progression and remaining used life of an aircraft in a real world situation'. The Examiner is therefore impermissibly using advantages described by Rastogi as the motivation for modifying the combination - and yet such combination does not have the features that Rastogi improves upon. The Examiner also fails to describe or explain why a person of ordinary skill in the art would have been motivated to incorporate the features of either of the cited Nechval or Gal references that are used in rejecting Claim 10 to the remaining combination of references Failure to Comply with US Supreme Court Precedent#2…” 

Examiner answers: As cited in the Office Action, (Rastogi, [0027] "the uncertainty modeling application 220 includes an aleatoric uncertainty module 222 that calculates or otherwise determines a probabilistic representation of the crack growth progression…”; also see [0028]), Rastogi teaches using the uncertainty modeling to predict the crack growth progression and remaining usage life. Other prior arts such as Kendall teaches using neural network for uncertainty modeling, or Goel teaches dropout training for neural network. Therefore, by applying the uncertainly model using neural network of Kendall and dropout training of neural networks of Goel to the real-world sensor data/measurement of Rastogi, it will improve the uncertainty modeling for processing those sensor data and, as a result, provide better prediction of crack growth progression and remaining usage life.

For the above reasons, it is believed that the rejections should be sustained.









Respectfully submitted,
/S.C./Examiner, Art Unit 2122                                                                                                                                                                                                        
Conferees:
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122                                                                                                                                                                                                        
/RYAN M STIGLIC/Primary Examiner 
                                                                                                                                                                                                     Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.