DETAILED ACTION
This action is in response the communications filed on 10/19/2021 and claims 1-25 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Appeal
In view of the appeal brief filed on 10/19/2021, PROSECUTION IS HEREBY REOPENED. New grounds of rejection are set forth below.
To avoid abandonment of the application, appellant must exercise one of the following two options:
(1) file a reply under 37 CFR 1.111 (if this Office action is non-final) or a reply under 37 CFR 1.113 (if this Office action is final); or,
(2) initiate a new appeal by filing a notice of appeal under 37 CFR 41.31 followed by an appeal brief under 37 CFR 41.37.  The previously paid notice of appeal fee and appeal brief fee can be applied to the new appeal.  If, however, the appeal fees set forth in 37 CFR 41.20 have been increased since they were previously paid, then appellant must pay the difference between the increased fees and the amount previously paid.
A Supervisory Patent Examiner (SPE) has approved of reopening prosecution by signing below:
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122                                                                                                                                                                                                        

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3, 4, 7, 11, 13-14, 16, 18-19, 22 and 24-25 are rejected under 35 U.S.C. 103 as being unpatentable over Kendall ("What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?") in view of Hernandez-Lobatoc ("Probabilistic backpropagation for scalable learning of bayesian neural networks") in view of Goel (US 20160307096 A1) in further view of Rastogi (US 20170351966 A1).
In regard to claims 1, 11, 16, 24 and 25, Kendall teaches: A computer-implemented method for simulating uncertainty in an artificial neural network, the computer-implemented method comprising: (Kndl., p. 1 "We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models [an artificial neural network] for vision tasks."; p. 3 "To capture epistemic uncertainty in a neural network (NN)… Such a model is referred to as a Bayesian neural network (BNN)...")
simulating… aleatoric uncertainty to measure what the artificial neural network does not understand… (Kndl., p. 1 "There are two major types of uncertainty one can model. Aleatoric uncertainty captures noise inherent in the observations... Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible. We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models [the artificial neural network] for vision tasks."; "In Bayesian modeling, there are two main types of uncertainty one can model [7]. Aleatoric uncertainty captures noise inherent in the observations. This could be for example sensor noise or motion noise, resulting in uncertainty which cannot be reduced even if more data were to be collected.")

    PNG
    media_image1.png
    55
    620
    media_image1.png
    Greyscale
simulating… epistemic uncertainty to measure what the artificial neural network does not know (Kndl. p. 1, "On the other hand, epistemic uncertainty accounts for uncertainty in the model – uncertainty which can be explained away given enough data.") by dropping out... from each respective layer of the artificial neural network during forward propagation of the sensor data and (Kndl., p. 3 "Dropout variational inference is a practical approach for approximate inference in large and complex models [15]. This inference is done by training a model with dropout before every weight layer [each respective layer of the artificial neural network], and by also performing dropout at test time to sample from the approximate posterior (stochastic forward passes [forward propagation of the sensor data], referred to as Monte Carlo dropout).") measuring impact of dropped out nodes on the output data of the artificial neural network; and (Kndl., p. 3 "Denoting the random output of the BNN as fW(x)"; "The minimisation objective is given by… with N data points, dropout probability p … σ the model’s observation noise parameter – capturing how much noise we have in the outputs… by marginalising over the (approximate) weights posterior distribution"; The objective function including the dropout rate p is used to measure the performance of the BNN, f is a Bayesian convolutional neural network parametrised by model weights W (p.5).)
by adding random values to edge weights between nodes in the artificial neural network during backpropagation of output data of the artificial neural network and (H.L., 3. Probabilistic backpropagation "In the second phase, the derivatives of the training loss with respect to the weights are propagated back from the output layer [during backpropagation of output data of the artificial neural network] towards the input. These derivatives are used to update the weights using, e.g., stochastic gradient descent with momentum... since the weights [edge weights] are now random... In the second phase, the gradients of this quantity with respect to the means and variances of the approximate Gaussian posterior are propagated back using reverse-mode differentiation as in classic backpropagation. These derivatives are finally used to update the means and variances of the posterior approximation... PBP uses the following property of Gaussian distributions... Let f(w) encode an arbitrary likelihood function for the single synaptic weight |w| given some data and let our current beliefs regarding the scalar |w| be captured by a distribution q(w) = N(w|m,v)."; Instead of just w, now single synaptic weights have mean and variance in a distribution, which are values describing the randomness of w.) measuring impact on the output data by the added random values to the edge weights between the nodes; (H.L., p. 3 "The update rule used by PBP... PBP uses the following property of Gaussian distributions... After seeing the data, our beliefs about w are updated according to Bayes’ rule..."; "Given a new input vector x*, we can then make predictions for its output y* using the predictive distribution given by..."; Weights are updated to impact the output data of the neural network model.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to train Bayesain deep learning model, as used in Kendall, by a novel scalable method, called probabilistic backpropagation (PBP), as taught by Hernandez-Lobatoc. Doing so would make the model significantly faster than other techniques, while offering competitive predictive abilities. (Kendall, abstract "We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks."; H.L., Abstract "a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP).… PBP is significantly faster than other techniques, while offering competitive predictive abilities.")

Kendall and Hernandez-Lobatoc do not teach, but Goel teaches: simulating… epistemic uncertainty… by dropping out a selected node from each respective layer of the artificial neural network (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) for each of one or more training sets (including a set of inputs and corresponding expected outputs) to tune network parameters (number of layers, number of nodes per layer [e.g. a selected node from each respective layer of the artificial neural network], number of training iterations, learning rate, etc.)."; Kendall teaches epistemic uncertainty.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to perform the dropout in the neural network, as taught by Kendall, by tuning the number of nodes selected per layer for dropout, as taught by Goel. The motivation to do so is to improve the neural network performance. (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training)…")

Kendall, Hernandez-Lobatoc and Goel do not teach, but Rastogi teaches: by a computer (Rastogi, [0025] "The processing system 202 generally represents the hardware, circuitry, processing logic, and/or other components of the computing device 200")
… from sensor data received from an object operating in a real-world environment (Rastogi, [0014] "By improving accuracy and reliability of the RUL metrics, condition-based maintenance (CBM) can be more effectively utilized to schedule and perform maintenance on a structural component [the object] exhibiting damage, fatigue, or other degradation."; [0018] "Still referring to FIG. 1, the measurement system 106 may include or otherwise utilize one or more measurement devices to obtain a current measurement of a physical defect [data from sensors / sensor data], such as, for example, X-ray devices, fiber Bragg grating (FBG) sensors, microscopes, probes, wireless sensor networks... capable of analyzing the structural component 102 to quantify or otherwise characterize the physical defect.;  [0015] "… in the context of aviation applications [an operating environment of the aircraft / a real-world environment ], where the structural component being analyzed is a structural member of an aircraft" )
performing, by the computer, an action corresponding to the object sending the sensor data and (Rastogi, [0014] "By improving accuracy and reliability of the RUL metrics, condition-based maintenance (CBM) can be more effectively utilized to schedule and perform maintenance [performing an action] on a structural component [the object] exhibiting damage, fatigue, or other degradation."; [0018] "Still referring to FIG. 1, the measurement system 106 may include or otherwise utilize one or more measurement devices to obtain a current measurement of a physical defect [data from sensors / sensor data], such as, for example, X-ray devices, fiber Bragg grating (FBG) sensors, microscopes, probes, wireless sensor networks... capable of analyzing the structural component 102 to quantify or otherwise characterize the physical defect. )
operating in the real-world environment (Rastogi, [0015] "… in the context of aviation applications, where the structural component being analyzed is a structural member of an aircraft"; [0019] "the degradation development module 120 calculates or otherwise determines one or more stress intensity factor values based on... the estimated loading on the component 102 during operation of the aircraft 104 [operating in the real-world environment]"; [0031] "the loading data 218 may be empirically obtained (e.g., based on sensors or other devices or systems onboard an aircraft during one or more flights)")
based on the impact of simulating the aleatoric uncertainty and the epistemic uncertainty.(Rastogi, [0027] "the uncertainty modeling application 220 includes an aleatoric uncertainty module 222 that calculates or otherwise determines a probabilistic representation of the crack growth progression..." [0028] "The uncertainty modeling application 220 also includes an epistemic uncertainty module 224 that calculates or otherwise determines confidence limits (or bounds) for the RUL output value"; abstract "The maintenance system determines maintenance schedule or other remedial action(s) for the structure in a manner that is influenced by the reference remaining usage life metric [impact of RUL]"; CBM is performed based on the influence of RUL (i.e. a maintenance action is performed based on the impact of RUL, which is determined by aleatoric uncertainty and the epistemic uncertainty)))
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to implement the system of K/LH/G on a computer, as taught by Rastogi.  The motivation to do so is for automation. Further, it would have been obvious to apply the uncertainly model invention of K/LH and dropout training of neural networks of Goel on real-world sensor data/measurement, as taught by Rastogi.  The motivation to do so is to determine crack growth progression and remaining usage life. (Rastogi, [0027] "the uncertainty modeling application 220 includes an aleatoric uncertainty module 222 that calculates or otherwise determines a probabilistic representation of the crack growth progression that accounts for uncertainty or variances associated with the current crack length measurement, the rivet forces, the stress intensity factors…"; [0014] "… systems and methods for determining probabilistic remaining usage life (RUL) metrics for structural components that account for both measurement uncertainties…")

In addition, Rastogi teaches: (claim 11) A computer system for simulating uncertainty in an artificial neural network, the computer system comprising: a bus system; (Rastogi, [0023] "The illustrated computing device 200 includes a communications interface 204, which generally represents the hardware, circuitry, and/or other components of the computing device 200 that are coupled to the processing system 202…")
a storage device connected to the bus system, wherein the storage device stores program instructions; and a processor connected to the bus system, wherein the processor executes the program instructions to (Rastogi, [0025] "the processing system 202 may be implemented or realized with a general purpose processor, a controller, a microprocessor, a microcontroller... the processing system 202 includes or otherwise accesses the data storage element 208, such as a memory (e.g., RAM memory, ROM memory, flash memory, registers, a hard disk, or the like)")
(claim 16) A computer program product for simulating uncertainty in an artificial neural network, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising: (Rastogi, [0025] "the processing system 202 may be implemented or realized with a general purpose processor, a controller, a microprocessor, a microcontroller... the processing system 202 includes or otherwise accesses the data storage element 208, such as a memory (e.g., RAM memory, ROM memory, flash memory, registers, a hard disk, or the like)")
Claims 24 and 25 are a broader version of claim 1, therefore the rejection applied to claim 1 also apply to claims 24 and 25.

In regard to claims 3, 13 and 18, Kendall, Hernandez-Lobatoc, Goel and Rastogi teach: The computer-implemented method of claim 1 further comprising: generating, by the computer, an output of the artificial neural network based on simulating the aleatoric uncertainty and the epistemic uncertainty. (Kndl., p. 4, 3 Combining Aleatoric and Epistemic Uncertainty in One Model, "We develop models that will allow us to study the effects of modeling either aleatoric uncertainty alone, epistemic uncertainty alone, or modeling both uncertainties together in a single model.")

In regard to claims 4, 14 and 19, Kendall, Hernandez-Lobatoc, Goel and Rastogi teach: The computer-implemented method of claim 1 further comprising: running, by the computer, the artificial neural network that includes a plurality of hidden layers using labeled... data samples (H.L. p. 2 2. Probabilistic neural network models " We describe a probabilistic model for data based on a feedforward neural network. Given data D = {xn, yn}... corresponding scalar target variables yn [labeled data samples]...We denote the outputs of the layers by vectors {zl} l = 0.. L, where z0 is the input layer, {zl} l = 1.. L-1 are the hidden units [hidden layers] and zL denotes the output layer...")
… using labeled sensor data samples corresponding to the real-world environment; and (Kndl., p. 5 "We fix a Gaussian likelihood to model our aleatoric uncertainty. This induces a minimisation objective given labeled output points x... where D is the number of output pixels yi [labeled sensor data] corresponding to input image x"; p. 2 "Figure 1 : Illustrating the difference between aleatoric and epistemic uncertainty for semantic segmentation on the CamVid dataset [sensor data][8]. Aleatoric uncertainty captures noise inherent in the observations. In (d) our model exhibits increased aleatoric uncertainty on object boundaries and for objects far from the camera."; camera is a sensor. see images from Figure 1 [real-world environment]; Both Kendall and H.L. (previous limitation) teach using labeled data when running neural network, and both Kendall and Rastogi (in claim 1) teach sensor data corresponding to real-world environment.)

    PNG
    media_image2.png
    38
    144
    media_image2.png
    Greyscale

    PNG
    media_image3.png
    58
    588
    media_image3.png
    Greyscale
utilizing, by the computer, an obtained output of the artificial neural network to determine model error based on a delta between a target output and the obtained output. (Kndl., p. 5 "… where D is the number of output pixels yi [target output / labeled data] corresponding to input image x, indexed by i..."; 

f or ^y is output of the neural network, and LBNN is the model error;  yi - ^yi is the delta between a target output and the obtained output)

In regard to claims 7 and 22, Kendall, Hernandez-Lobatoc, Goel and Rastogi teach: The computer-implemented method of claim 1 further comprising: receiving, by the computer, the sensor data from the object operating in the real- world environment; (Rastogi, [0014] "By improving accuracy and reliability of the RUL metrics, condition-based maintenance (CBM) can be more effectively utilized to schedule and perform maintenance on a structural component [the object] exhibiting damage, fatigue, or other degradation."; [0018] "Still referring to FIG. 1, the measurement system 106 may include or otherwise utilize one or more measurement devices to obtain a current measurement of a physical defect [data from sensors / sensor data], such as, for example, X-ray devices, fiber Bragg grating (FBG) sensors, microscopes, probes, wireless sensor networks... capable of analyzing the structural component 102 to quantify or otherwise characterize the physical defect.;  [0015] "… in the context of aviation applications [an operating environment of the aircraft / a real-world environment ], where the structural component being analyzed is a structural member of an aircraft" )
determining, by the computer, an intensity level of the sensor data; and (Rastogi, [0020] "... an uncertainty analysis module 122 that receives the degradation development data from the degradation development module 120 and calculates or otherwise determines a probabilistic representation of the degradation development data [an intensity level of the sensor data] using the maintenance a current measurement of the physical defect"; [0015] "The degradation development data set is then utilized to create a probabilistic representation of the mechanical defect determined based at least in part on the degradation development data and uncertainties associated with the measurement of the mechanical defect, the rivet forces, and/or other constituent parameters used to derive the degradation development data."; current measurement are obtained from sensors.)
determining, by the computer, whether the intensity level of the sensor data is greater than or equal to an intensity level threshold level indicating occurrence of an unknown event. (Rastogi, [0020] "the probabilistic representation represents the probability of the size of the physical defect exceeding the maintenance threshold [greater than an intensity level threshold level] after a certain number of cycles (or time intervals) of operation of the aircraft 104."; [0019] "the degradation development data represents a prediction or prognostication of the progression of the physical defect in the future [occurrence of an unknown event]... the degradation development module 120 determines a crack growth data set representing the predicted progression of the crack based on the current measurement of the crack length received from the measurement system 106 and the force or stress the component 102 is subjected to during operation of the aircraft 104")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have the data to be inputted to the uncertainty analysis, as taught by Kendall, to be filtered with the threshold value, as taught by Rastogi. The motivation to do so is to ensure the data used for analysis are sufficiently large enough or statistically significant for maintenance.(Rastogi, [0019] "the degradation development module 120 calculates or otherwise determines incremental amounts of growth in the crack length (or progression of the crack) on a per cycle…based on the preceding crack length measurement… until reaching a maintenance threshold
Claims 2, 5, 6, 12, 15, 17, 20 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Kendall in view of Hernandez-Lobatoc in view of Goel in view of Rastogi in further view of Gal ("Concrete Dropout"). 
In regard to claims 2, 12 and 17, Kendall, Hernandez-Lobatoc and Goel teach: The computer-implemented method of claim 1 further comprising: 
selecting, by the computer, a node to be randomly dropped from each layer of the artificial neural network... for Monte Carlo dropout sampling; and (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) for each of one or more training sets (including a set of inputs and corresponding expected outputs) to tune network parameters (number of layers, number of nodes per layer [e.g. a node from each layer of the artificial neural network], number of training iterations, learning rate, etc.)."; [0061] "An applied dropout rate may again be sampled (e.g., re-sampled) by the sampler 304 to evaluate generalization performance of the current training iteration."; [0065] "Annealing the dropout rate (e.g., viewed as a temperature parameter) using the adjuster 306 is an effective way to mitigate against the poor solutions. Dropout training can be viewed as a Monte Carlo approach that optimizes the expected loss over the ensemble of models formed by all possible dropout masks over node outputs (e.g., a Bayesian objective). In one embodiment, a stochastic method for annealed dropout [randomly dropped] may be employed, and this method may do more than gradually increase the theoretical capacity of the network."; [0070] "In one embodiment, an aggregation may be implemented over an exponential number of models (e.g., ensemble of models), each with a unique dropout mask over the set of weights for a given layer [e.g. each layer] of the network.")
for each respective layer of the artificial neural network:… selecting, by the computer, the node to be randomly dropped from that particular layer based on applying an intensity level of the (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) for each of one or more training sets (including a set of inputs and corresponding expected outputs) to tune network parameters (number of layers, number of nodes per layer [e.g. dropout rate for the particular hidden layer], number of training iterations, learning rate, etc.)."; [0030] "In an embodiment, the dropout rate may be annealed, and the probability distribution over dropout rates [the probability density function] (and other parameters / hyperparameters) may be adjusted / evolved."; [0067] "For example, if the dropout rate is a function only of the training epoch t, a general formulation according to one embodiment of the present invention may be: p d [t]=p d [t−1] + αt(θ) (4) where 0≦pd[t]≦1 is the dropout probability [probability function] at epoch t, and αt(θ) is an annealing rate parameter (e.g., dropout rate parameter) that may optionally depend on the current state (or estimate of the state) of auxiliary inputs/parameters θ [an intensity level of the data](Including, for example, p_d[t′] for t′<t). It is noted that the term 'annealing' implies that 0≦αt≦1..."; [0086] "As mentioned above, hyperpararmeters (e.g., the dropout rate) can be reduced or increased so that the next iteration fits the training data more appropriately."; Goel teaches: dropping out is performed based on applying an intensity level of the input data to the probability density function)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to perform the dropout in the neural network, as taught by Kendall, by tuning the number of nodes selected per layer for dropout, as taught by Goel. The motivation to do so is to improve the neural network performance. (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training)…")

Kendall, Hernandez-Lobatoc and Goel do not teach, but Rastogi teaches:  … by using the sensor data corresponding to the real-world environment (Rastogi, [0018] "Still referring to FIG. 1, the measurement system 106 may include or otherwise utilize one or more measurement devices to obtain a current measurement of a physical defect [data from sensors / sensor data], such as, for example, X-ray devices, fiber Bragg grating (FBG) sensors, microscopes, probes, wireless sensor networks... capable of analyzing the structural component 102 to quantify or otherwise characterize the physical defect.;  [0015] "… in the context of aviation applications [an operating environment of the aircraft / a real-world environment ], where the structural component being analyzed is a structural member of an aircraft")
… an intensity level of the sensor data… (Rastogi, [0020] "... an uncertainty analysis module 122 that receives the degradation development data from the degradation development module 120 and calculates or otherwise determines a probabilistic representation of the degradation development data [an intensity level of the sensor data] using the maintenance threshold"; [0005] "The method involves determining degradation development data for the physical defect based at least in part on a current measurement of the physical defect"; [0015] "The degradation development data set is then utilized to create a probabilistic representation of the mechanical defect determined based at least in part on the degradation development data and uncertainties associated with the measurement of the mechanical defect, the rivet forces, and/or other constituent parameters used to derive the degradation development data.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to apply the uncertainly model invention of K/LH and dropout training of neural crack growth progression that accounts for uncertainty or variances associated with the current crack length measurement, the rivet forces, the stress intensity factors…"; [0014] "… systems and methods for determining probabilistic remaining usage life (RUL) metrics for structural components that account for both measurement uncertainties…")

Kendall, Hernandez-Lobatoc, Goel and Rastogi do not teach, but Gal teaches:

    PNG
    media_image4.png
    176
    308
    media_image4.png
    Greyscale
for each respective layer of the artificial neural network: identifying, by the computer, a probability density function corresponding to a particular layer; and (Gal, p. 1 Abstract "We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout’s discrete masks."; different dropout probabilities in different layers; p.3 3 Concrete Dropout "... dropout is seen as an approximating distribution [a probability density function] to the posterior in a Bayesian neural network with a set of random weight matrices w = {Wl} l = 1..L with L layers [respective layer]"; p.5 Figure 1(d) "(d) Optimised dropout probability values (per layer)."; p. 6 Figure 4 "Figure 4 shows posterior dropout probabilities across different cross validation splits."; "Figure 4: Converged dropout probabilities per layer")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the dropout of Kendall and the tuning of dropout of Goel to incorporate the teachings of Gal by including automatic tuning of the dropout probability. Doing so 


    PNG
    media_image5.png
    143
    708
    media_image5.png
    Greyscale

    PNG
    media_image4.png
    176
    308
    media_image4.png
    Greyscale
In regard to claims 5, 15 and 20, Kendall, Hernandez-Lobatoc, Goel, Rastogi and Gal teach: The computer-implemented method of claim 4 further comprising: inputting, by the computer, the obtained output of the artificial neural network into (Gal, p. 3 "… fw(xi) the neural network’s output [the obtained output of the artificial neural network] on input xi when evaluated with weight matrices realisation.) each different type of probability density function corresponding to each respective hidden layer in the plurality of hidden layers (Gal, p. 1 Abstract "We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout’s discrete masks."; different dropout probabilities in different layers; p.3 3 Concrete Dropout "... dropout is seen as an approximating distribution [different type of probability density function] to the posterior in a Bayesian neural network with a set of random weight matrices w = {Wl} l = 1..L with L layers [respective hidden layer]"; p.5 Figure 1(d) "  (d) Optimised dropout probability values (per layer)."; p. 6 Figure 4 "Figure 4 shows posterior dropout probabilities across different cross validation splits."; "Figure 4: Converged dropout probabilities per layer") to generate edge weight adjustments between nodes based on probabilities o f occurrence of the obtained output in the real- world environment.  (Gal, p. 3 "The optimisation objective that follows from the variational interpretation can be written as: … fw(xi) the neural network’s output on input xi 
    PNG
    media_image6.png
    77
    787
    media_image6.png
    Greyscale
when evaluated with objective function is used for weight adjustments based on p(yi|fw(xi)) [probabilities of occurrence of the obtained output])
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the dropout of Kendall and the tuning of dropout of Goel to incorporate the teachings of Gal by including automatic tuning of the dropout probability. Doing so would allow for automatic tuning of the dropout in large models, as a result faster experimentation cycles. (Gal, Abstract "Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed.")

In regard to claims 6 and 21, Kendall, Hernandez-Lobatoc, Goel and Rastogi and Gal teach: The computer-implemented method of claim 5 further comprising: backpropagating, by the computer, the model error through the artificial neural network to update edge weights between nodes in the plurality of hidden layers based on a level of contribution by each respective node to the model error; and (H.L., p.2 2. Probabilistic neural network models "The NN has L layers, where Vl is the number of hidden units in layer l, and W is the collection of Vl... weight matrices between the fully connected layers."; 3. Probabilistic backpropagation "In the second phase, the derivatives of the training loss  with respect to the weights [e.g. a level of contribution to the model error] are propagated back from the output layer towards the input."; d(loss)/d(node weight), the gradient, is a level of contribution by each respective node to the model error; "In this section we describe a probabilistic alternative to the backpropagation algorithm, which we call probabilistic backpropagation (PBP)... In the second phase, the gradients of this quantity with respect to the means and variances of the approximate Gaussian posterior [e.g. a level of contribution to the model error] are propagated back using reverse-mode differentiation as in classic backpropagation.")

    PNG
    media_image7.png
    115
    420
    media_image7.png
    Greyscale
adding, by the computer, the edge weight adjustments to the updated edge weights (H.L. p. 3, "These derivatives are used to update the weights using, e.g., stochastic gradient descent with momentum… These derivatives are finally used to update the means and variances of the posterior approximation… A common choice is to approximate this posterior with a distribution that has the same form as q. In this case, the parameters of the new Gaussian beliefs qnew(w) = N(w|mnew, vnew) that minimize the the Kullback-Leibler (KL) divergence between s and qnew can then be obtained as a function of m, v... These are the main update equations used by PBP.") between nodes in each respective hidden layer in the plurality of hidden layers (H.L., p.2 2. Probabilistic neural network models "The NN has L layers, where Vl is the number of hidden units in layer l [nodes in respective layers], and W is the collection of Vl... weight matrices between the fully connected layers.") to simulate the aleatoric uncertainty. (Kndl., p. 4, 3 Combining Aleatoric and Epistemic Uncertainty in One Model, "We develop models that will allow us to study the effects of modeling either aleatoric uncertainty alone, epistemic uncertainty alone, or modeling both uncertainties together in a single model.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to train Bayesain deep learning model, as used in Kendall, by a novel scalable method, called probabilistic backpropagation (PBP), as taught by Hernandez-Lobatoc. Doing so would make the model significantly faster than other techniques, while offering competitive predictive abilities. (Kendall, abstract "We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks."; H.L., Abstract "a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP).… PBP is significantly faster than other techniques, while offering competitive predictive abilities.")

Claims 8 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Kendall in view of Hernandez-Lobatoc in view of Goel in view of Rastogi in further view of Nechval ("Prediction of fatigue crack growth process via artificial neural network technique").
In regard to claims 8 and 23, Kendall, Hernandez-Lobatoc, Goel and Rastogi teach: The computer-implemented method of claim 7 further comprising: 
… performing, by the computer, Monte Carlo dropout sampling on the sensor data to determine which node in each respective hidden layer in the plurality of hidden layers is to be randomly dropped out to simulate the unknown event. (Kndl., p. 3 "Dropout variational inference is a practical approach for approximate inference in large and complex models [15]. This inference is done by training a model with dropout before every weight layer [each respective layer of the artificial neural network], and by also performing dropout at test time to sample from the approximate posterior (stochastic forward passes, referred to as Monte Carlo dropout)."; p. 2 "Figure 1 : Illustrating the difference between aleatoric and epistemic uncertainty for semantic segmentation on the CamVid dataset [sensor data][8]. Aleatoric uncertainty captures noise inherent in the observations. In (d) our model exhibits increased aleatoric uncertainty on object boundaries and for objects far from the camera."; A camera is a sensor. Simulating uncertainty for predicting object boundaries is simulating the unknown event.) 
(Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) for each of one or more training sets (including a set of inputs and corresponding expected outputs) to tune network parameters (number of layers, number of nodes per layer [e.g. which node in each respective hidden layer], number of training iterations, learning rate, etc.)."; [0061] "An applied dropout rate may again be the dropout rate (e.g., viewed as a temperature parameter) using the adjuster 306 is an effective way to mitigate against the poor solutions. Dropout training can be viewed as a Monte Carlo approach that optimizes the expected loss over the ensemble of models formed by all possible dropout masks over node outputs (e.g., a Bayesian objective). In one embodiment, a stochastic method for annealed dropout [randomly dropped out] may be employed, and this method may do more than gradually increase the theoretical capacity of the network."; [0070] "In one embodiment, an aggregation may be implemented over an exponential number of models (e.g., ensemble of models), each with a unique dropout mask over the set of weights for a given layer [e.g. each respective hidden layer] of the network.") (Kendall teaches dropping out on sensor data to simulate the unknown event. Goel teaches number of nodes in each respective hidden layer to be randomly dropped out.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to perform the dropout in the neural network, as taught by Kendall, by tuning the number of nodes selected per layer for dropout, as taught by Goel. The motivation to do so is to improve the neural network performance. (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training)…")

Kendall, Hernandez-Lobatoc, Goel and Rastogi do not teach, but Nechval teaches: responsive to the computer determining that the intensity level of the sensor data is greater than or equal to the intensity level threshold level indicating occurrence of an unknown event, (Nechval, p. 2 "The fatigue crack growing process is classified in three regions according to the change of fatigue crack growth as low as the fatigue threshold (Kth), and the crack growth rate is very slow... In region II... In region III..."; The prediction of the fatigue cracks are based on the stress intensity factor. In those three regions, the factor is greater than Kth, the fatigue threshold.)
inputting, by the computer, the sensor data into the artificial neural network (Nechval, p. 1 "In this paper, the artificial neural network (ANN) technique for the data processing of on-line fatigue crack growth monitoring is proposed after analyzing the general technique for fatigue crack growth data."; p. 7 "The multi-layer perceptron network comprises an input layer, an output layer and a number of hidden layers... The number of neurons in each layer may vary dependent on the problem."; fatigue crack growth data greater than the threshold are inputted into ANN with multiple hidden layers.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the aleatoric and epistemic model of Rastogi to incorporate ANN to predict the crack growth, as taught by Nechval. The motivation to do so is to make up the inadequacy of data processing for current technique and on-line monitoring. (Nechval, p. 1 " A model for predicting the fatigue crack growth by ANN is presented... The feasibility of this model was verified by some examples. It makes up the inadequacy of data processing for current technique and on-line monitoring. Hence it has definite realistic meaning for engineering application.")

Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Kendall in view of Hernandez-Lobatoc in view of Goel in view of Rastogi in in view of Nechval in further view of Gal.
In regard to claim 9, Kendall, Hernandez-Lobatoc, Goel, Rastogi and Nechval do not teach, but Gal teaches: The computer-implemented method of claim 8 further comprising: selecting, by the computer, a hidden layer in the plurality of hidden layers; and (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly tune network parameters (number of layers [e.g. a layer is selected], number of nodes per layer, number of training iterations, learning rate, etc.).")
identifying, by the computer, a probability density function corresponding to the selected hidden layer in the plurality of hidden layers  (Gal, p. 1 Abstract "We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout’s discrete masks."; different dropout probabilities in different layers; p.3, 3 Concrete Dropout "... dropout is seen as an approximating distribution to the posterior in a Bayesian neural network with a set of random weight matrices w = {Wl} l = 1..L with L layers [a hidden layer]... The KL term KL(q(w)|p(w)) is a 'regularisation' term which ensures that the approximate posterior q(w) does not deviate too far from the prior distribution p(w). A note on our choice for a prior is given in appendix B. Assume that the set of variational parameters for the dropout distribution satisfies... {Ml, pl} l= 1..L, a set of mean weight matrices and dropout probabilities such that ..."; q(w) is identified by using KL term, and q(w) is a probability density function for each hidden layer that models an output/posterior of the ANN/BNN.)

In regard to claim 10, Kendall and Hernandez-Lobatoc teach: The computer-implemented method of claim 9 further comprising: 
… dropping out, by the computer,  associated with the unknown event. (Kndl., p. 3 "Dropout variational inference is a practical approach for approximate inference in large and complex models [15]. This inference is done by training a model with dropout before every weight layer [hidden layer], and by also performing dropout at test time to sample from the approximate posterior (stochastic forward passes, Monte Carlo dropout)."; p. 2 "Figure 1 : Illustrating the difference between aleatoric and epistemic uncertainty for semantic segmentation on the CamVid dataset[8]… In (d) our model exhibits increased aleatoric uncertainty on object boundaries and for objects far from the camera." [e.g. unknown event]; p. 3 "To capture epistemic uncertainty in a neural network (NN)… "; Simulating uncertainty for predicting object boundaries is simulating the unknown event.) (Kendall teaches: dropping out to simulate epistemic uncertainty; and Goel teaches: the selected node within the selected hidden layer)
Kendall and Hernandez-Lobatoc do not teach, but Goel teaches: … dropping out, by the computer, the selected node within the selected hidden layer (Goel, [0006] "'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network... to tune network parameters (number of layers, number of nodes per layer [e.g.  a node within the selected hidden layer], number of training iterations, learning rate, etc.).") 
selecting, by the computer, a node within the selected hidden layer to be randomly dropped out (Goel, [0006] "'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) for each of one or more training sets (including a set of inputs and corresponding expected outputs) to tune network parameters (number of layers, number of nodes per layer [e.g.  a node within the selected hidden layer], number of training iterations, learning rate, etc.)."; [0070] "In one embodiment, an aggregation may be implemented over an exponential number of models (e.g., ensemble of models), each with a unique dropout mask over the set of weights for a given layer [e.g. the selected hidden layer] of the network.") based on applying the intensity level of the sensor data to the identified probability density function; and (Goel, [0030] "In an embodiment, the dropout rate may be annealed, and the probability distribution over dropout rates (and other parameters / hyperparameters) may be adjusted / evolved."; [0067] "For example, if the dropout rate is a function only of the training epoch t, a general formulation according to one embodiment of the ≦pd[t]≦1 is the dropout probability at epoch t, and αt(θ) is an annealing rate parameter (e.g., dropout rate parameter) that may optionally depend on the current state (or estimate of the state) of auxiliary inputs/parameters θ (Including, for example, p_d[t′] for t′<t). It is noted that the term 'annealing' implies that 0≦αt≦1, , but using variable rate annealing schedules 311 to determine the dropout rate for successive iterations (e.g., instead of a constant or static dropout rate) that increase (or decrease) the dropout rate to be used for the next iteration (e.g. sample the dropout rate from a current distribution estimate) may also be utilized."; [0086] "As mentioned above, hyperpararmeters (e.g., the dropout rate) can be reduced or increased so that the next iteration fits the training data more appropriately.") (Goel teaches: based on applying current state / intensity level of input data to the probability density function; and Rastogi teaches: the intensity level of sensor data.) 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to perform the dropout in the neural network, as taught by Kendall, by tuning the number of nodes selected per layer for dropout, as taught by Goel. The motivation to do so is to improve the neural network performance. (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training)…")

Kendall, Hernandez-Lobatoc and Goel do not teach, but Rastogi teaches: … the intensity level of the sensor data (Rastogi, [0018] "Still referring to FIG. 1, the measurement system 106 may include or otherwise utilize one or more measurement devices to obtain a current measurement of a physical defect [data from sensors / sensor data]"; [0020] "... an uncertainty analysis module 122 that receives the degradation development data from the degradation development module 120 and calculates or a probabilistic representation of the degradation development data [an intensity level of the sensor data] using the maintenance threshold")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to apply the uncertainly model invention of K/LH and dropout training of neural networks of Goel on real-world sensor data/measurement, as taught by Rastogi. The motivation to do so is to determine crack growth progression and remaining usage life of an aircraft in a real-world situation. (Rastogi, [0027] "the uncertainty modeling application 220 includes an aleatoric uncertainty module 222 that calculates or otherwise determines a probabilistic representation of the crack growth progression that accounts for uncertainty or variances associated with the current crack length measurement, the rivet forces, the stress intensity factors…"; [0014] "… systems and methods for determining probabilistic remaining usage life (RUL) metrics for structural components that account for both measurement uncertainties…")
Response to Arguments
Applicant's arguments with respect to the rejection of the claims under 35 U.S.C. 103 have been fully considered but they are moot:
 (a) Applicant argues: (see p. 10 bottom, claim 1): “…Notably, there is no mention of 'simulating uncertainty' in 'an artificial neural network', as claimed. Instead, uncertainty predictions 'for products' is reduced…Notably, there is no mention of 'simulating uncertainty' in 'an artificial neural network', as claimed. Instead, (1) a learning model includes a 'neural network' that is applied to process 'data structures' and (2) dominant uncertainties are 'propagated' from…” 
(see p. 28 top, claim 8): “…There is no mention of performing Monte Carlo dropout sampling on 'sensor data' data' to determine which node in each respective hidden layer in the plurality of hidden layers is to be randomly dropped out to simulate the unknown event, as  claimed… Goel also does not 
(see p. 31 top, claim 24): “The combined teachings of the cited references do not describe performing 'aleatoric uncertainty' simulation. Instead, Engel is alleged to teach 'simulating' and Kendall is alleged to teach 'aleatoric uncertainty' - but the combined teachings do not describe actually simulating 'aleatoric uncertainty', as claimed.”
(see p. 31 bottom, claim 25): “(1) …. in an artificial network”
(see p. 47 bottom, claim 10): “…the 'dropping out' and 'simulating' are each treated in complete isolation from one another since one claimed feature ('dropping out') is alleged to be described by one reference …” 
(b) Examiner answers: the arguments do not apply to new citation from Kendall being used in the current rejection.

(a) Applicant argues:(see p. 11 bottom, claim 1) “Notably, there is no mention of a computer performing an 'action' corresponding to an 'object' sending 'the sensor data…”
(see p. 12 bottom, claim 1) “there is no mention of (1) an 'object' sending 'the sensor data', (2) a computer performing an 'action' corresponding to an 'object' sending 'the sensor data', or (3) performing an 'action' based on 'the impact of simulating the aleatoric uncertainty and the epistemic uncertainty', as claimed…” 
 (see p. 17 bottom - p. 19, claim 3) “… modifying Engel to incorporate the teachings of Kendall would not make the Engel model more robust since Engel does not describe a 'model' for which robustness would be effectuated by the Kendall teachings.”

(see p. 31 bottom, claim 25): “(2) …. Object sending the sensor data… (3) action based on …”

(b) Examiner answers: the arguments do not apply to Rastogi being used in the current rejection. 

(a) Applicant argues: (see p. 20 top, claim 4): “It is noted that Kendall and Engel do not describe any use of 'labeled sensor data samples', as claimed. Accordingly, the combined teachings of the cited references do not describe using 'labeled sensor data samples' when running an artificial neural network that includes a plurality of hidden layers” 
(b) Examiner answers: the arguments do not apply to the new citation from Hernandez-Lobatoc being used in the current rejection. The given target variables yn are labeled data samples using in the Probabilistic neural network models.

(a) Applicant argues: (see p. 26 bottom, claim 8): “…There is no mention of performing a sensor data 'input' operation into an artificial neural network responsive to the computer 'determining' that the intensity level of the sensor data is greater than or equal to the intensity level threshold level indicating occurrence of an unknown event. Instead, a model is described that has an 'input state', an assortment of 'model parameters', and 'usage' as input.” 
(b) Examiner answers: the arguments do not apply to Nechval being used in the current rejection. 

(a) Applicant argues: (see p. 46 middle, claim 10): “… the 'selecting' and 'applying' are each treated in complete isolation from one another since one claimed feature ('selecting') is alleged to be described by one reference, and another claimed feature ('applying') is alleged to be described by another reference...” 
(b) Examiner answers: the arguments do not apply to new citation of Goel being used in the current rejection. Goel teaches the ‘apply’ feature in [0030] (“dropout rate parameter) that may optionally depend on the current state”). That is, Goel teaches: dropping out based on applying current state / level of input data to the probability density function.

(a) Applicant argues: (see p. 43 middle, claim 9): “…Choi depicts a 'mixture density network' with two hidden layers and an output. Notably, Choi Figure 1 does not describe any operational steps being performed, and therefore Choi Figure 1 does not describe or depict any 'identifying' step/action, as is provided by the features of Claim 9.”
(b) Examiner answers: the arguments do not apply to new citation of Gal being used in the current rejection. q(w) is identified by using KL term, and q(w) is a probability density function for each hidden layer that models an output/posterior of the ANN/BNN.

(a) Applicant argues: 
(see p. 14 middle – p. 17 top, claim 1) “Failure to Comply with US Supreme Court Precedent#1… Failure to Comply with US Supreme Court Precedent#2”
(see p. 17 middle – p. 19, claim 3) “Failure to Comply with US Supreme Court Precedent#1… Failure to Comply with US Supreme Court Precedent#2”
(see p. 20 bottom – p. 22, claim 4) “Failure to Comply with US Supreme Court Precedent#1… Failure to Comply with US Supreme Court Precedent#2”

(see p. 28 middle – p. 30, claim 8) “Failure to Comply with US Supreme Court Precedent#1… Failure to Comply with US Supreme Court Precedent#2”
(see p. 31 bottom, claim 25): “(5) Failure to Comply with US Supreme Court Precedent#1… (6) Failure to Comply with US Supreme Court Precedent#2”
(see p. 33 – p. 35, claim 2) “Failure to Comply with US Supreme Court Precedent#1… Failure to Comply with US Supreme Court Precedent#2”
(see p. 37 – p. 38, claim 5) “Failure to Comply with US Supreme Court Precedent#1… Failure to Comply with US Supreme Court Precedent#2”
(see p. 40 bottom – p. 42, claim 6) “Failure to Comply with US Supreme Court Precedent#1… The Failure to Comply with US Supreme Court Precedent#2”
(see p. 44 – p. 45, claim 9) “Failure to Comply with US Supreme Court Precedent#1… Failure to Comply with US Supreme Court Precedent#2”
(see p. 48 – p. 49, claim 10) “Failure to Comply with US Supreme Court Precedent#1… to incorporate the teachings of Goel… Failure to Comply with US Supreme Court Precedent#2”
(b) Examiner answers: the arguments do not apply to new motivations in the current rejection.


Applicant's arguments with respect to the rejection of the claims under 35 U.S.C. 103 have been fully considered but they are not persuasive:
 (a) Applicant argues: (see p. 13 middle, claim 1): “…Notably, none of the cited references describe dropping out an actual 'node', as claimed. Instead, 'inputs and outputs' of a node are dropped out, but the node remains.” 
(b) Examiner answers: Dropout may not necessarily include deleting neurons / nodes. The BRI of dropout include deactivating those neurons temporarily and randomly by zeroing those multiplication operations of the inputs or outputs of the neurons. If the input or output of a neuron is multiplied to zero, the neuron is dropped out. Further, Goel also teaches a node is dropped out in [0020] (“The term ‘dropout’ refers herein to dropping out (or adding) nodes/neurons (or other input or output data). In one embodiment, dropout training includes temporarily removing (or adding) one or more nodes/neurons, and temporarily removing (or adding) all incoming and outgoing connections to the removed (or added) nodes/neurons.”). Therefore, dropping out a node is equivalent to excluding all of its incident edges from consideration, and Goel teaches the claimed invention.   

(a) Applicant argues: (see p. 14 top, claim 1): “…Appellant urges that Kendall's 'dropout rate p' is not associated with, and does not correspond to, 'dropped out nodes', as claimed. Instead, this Kendall dropout is described as 'performing dropout at test time to sample from the approximate posterior (stochastic forward passes, referred to as Monte Carlo dropout)'. This 'Monte Carlo dropout' is not described as being any type of 'dropped out nodes', as claimed. Instead, the Kendall 'dropout' pertains to sampling 'from the approximate posterior' (Kendall page 3).”
(b) Examiner answers: In the Bayesian deep learning or Bayesian neural network, each weight of the network connections has its own mean and variance, and is described as a posterior distribution. As noted in the previous Office Action, the term ‘Dropout’ does not actually delete neurons / nodes, it deactivate those neurons temporarily and randomly by zeroing the weight of the neurons. If the weight of a neuron is multiplied to zero, the neuron is dropped out. 
When applying dropout to a standard neural network, the dropout is chosen / sampled from the connections with fixed weights. On the other hand, when applying dropout to a Bayesian neural 

 (a) Applicant argues: (see p. 31 bottom, claim 25): “(4) …. Dropping out ‘’a selected node” 
(see p. 32 bottom, claim 2): “[0006], [0061], [0065] and [0070] …There is no mention of selecting 'a node' to be 'randomly dropped' from each layer of the artificial neural network, as claimed …” 
(b) Examiner answers: As mentioned above, dropout may not necessarily include deleting neurons / nodes. If the input or output of a neuron is multiplied to zero, the neuron is deactivated and dropped out. Goel teaches 'a node' to be 'randomly dropped' from each layer of the artificial neural network in [0006] and [0070] (“'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) … to tune network parameters (…number of nodes per layer”). ‘per layer’ means each layer of the artificial neural network. Therefore, Goel teaches the claimed invention.

(a) Applicant argues: (see p. 33 middle, claim 2): “…both the ‘identifying’ and ‘selecting’ steps/actions are performed for each respective layer of the artificial neural network… there is no mention of selecting 'the node to be randomly dropped' for each respective layer of the artificial neural network.” 
(b) Examiner answers: As mentioned above, dropout may not necessarily include deleting neurons / nodes. If the input or output of a neuron is multiplied to zero, the neuron is deactivated and dropped out. Goel teaches 'a node' to be 'randomly dropped' from each layer of the artificial neural network in [0006] and [0070] (“'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) … to tune network parameters (…number of per layer”). ‘per layer’ means each respective layer of the artificial neural network. Therefore, Goel teaches the claimed invention.

(a) Applicant argues: (see p. 36 middle, claim 5): “that have no operational synergy with respect to the 'target' of where the obtained output is input 'into'… Gal does not describe that 'the obtained output' is input into each different type of probability density function corresponding to each respective hidden layer in the plurality of hidden layers, as is provided by the plain meaning of Claim 5. Instead, Gal describes "the neural network's output on input xi. This 'output' is not input into 'each' different type of probability density function, as claimed.” 
(b) Examiner answers: As noted in the previous office action, Gal teaches this feature on p.3 

    PNG
    media_image8.png
    51
    486
    media_image8.png
    Greyscale


That is, fw(xi), the obtained output is used in the objective function, i.e. fw(xi) the obtained output is inputted / provided to the neural network for calculating the objection function and updating the weights. Gal also teaches this in (“fw(xi) the neural network’s output on input xi when evaluated with weight matrices realization [updating weights in the neural network]”) Therefore, Gal teaches the claimed invention.

(a) Applicant argues: (see p. 36 bottom, claims 5): “…there is no mention of (1) generating 'edge weight adjustments' between nodes, (2) 'probabilities' of occurrence of the obtained output in the real-world environment, or (3) using 'probabilities' of occurrence of the obtained output in the real-world environment as the basis for generating 'edge weight adjustments' between nodes. Instead, an optimization objective is described.” 
(b) Examiner answers: As noted in the previous office action, Gal teaches this feature on p.3  (“fw(xi) the neural network’s output on input xi when evaluated with weight matrices realisation, and p(yi|fw(xi)) the model’s likelihood, e.g. a Gaussian with mean fw(xi).")
That is, (1) fw(xi) is used when running the weight matrices realisation, i.e. weight adjustments between nodes are generated based on fw(xi). (2) p(yi|fw(xi)) is the probabilities of occurrence of the obtained output. (3) fw(xi) is used in the objective function, i.e. fw(xi) is inputted / provided to the neural network for calculating the objection function and updating the weights (weight adjustments between nodes). Gal teaches the claimed invention.

(a) Applicant argues: (see p. 39 middle, claims 6): “…no operational steps are described in this section 2, so this cited passage does not teach the claimed 'backpropagating' step/action. In section 3, Hernandez-Lobatoc describes that 'the derivatives of the training loss' are propagated back from the output layer toward the input. In contrast, per Claim 6, 'the model error' is what is backpropagated…” 
(b) Examiner answers: As noted in the previous office action, Hernandez-Lobatoc teaches feature in section 3 (“Probabilistic backpropagation "In the second phase, the derivatives of the training loss with respect to the weights are propagated back from the output layer towards the input”). That is, training loss is model error, which is propagated back from the output layer towards the input. Therefore, Hernandez-Lobatoc teaches the claimed invention.

(a) Applicant argues: (see p. 40 middle, claim 6): “… there is no mention of. (1) 'the edge weight adjustments' that are generated between nodes based on probabilities of occurrence of the obtained output in the real-world environment are added to (2) 'the updated edge weights between nodes' that are updated based on a level of contribution by each respective node to 'the model error' that is determined based on a delta between a target output and the obtained output', as claimed. Instead, 
(b) Examiner answers:  Claim 6 does not require “'the edge weight adjustments' that are generated between nodes based on probabilities of occurrence of the obtained output in the real-world environment.” Gal teaches this particular feature in claim 5. Further, claim 6 does not require “the model error' that is determined based on a delta between a target output and the obtained output.” Kendall teaches this particular feature in claim 4. The claim only requires “the edge weight adjustments are added to the updated edge weights.” Because the weights in the PBP are now random, Gaussian distribution is used to approximate the weight. The update rule in the PBP is accordingly to update the means and the variances of the distribution, i.e. this distribution represents the weights being updated / adjusted. Therefore, Hernandez-Lobatoc teaches the claimed invention.

(a) Applicant argues: (see p. 40 bottom, claim 6): “…that the 'adding' and 'simulating' are each treated in complete isolation from one another since one claimed feature ('adding') is alleged to be described by one reference, and another claimed feature ('simulating') is alleged to be described by another reference…” 
(b) Examiner answers: Hernandez-Lobatoc is used to teach adding weight adjustments in the neural network in machine learning / training, and Kendall is used to teach using a neural network to simulate / learn the uncertainty. Both Hernandez-Lobatoc and Kendall are analogous arts as being directed to the same field of endeavor, learning using neural network. Therefore, Hernandez-Lobatoc and Kendall teach the claimed invention.

(a) Applicant argues: (see p. 46 middle, claim 10): “…Goel does not teach selecting a 'node' to be randomly dropped out, but instead describes dropping out 'inputs or outputs' of a node” 
(b) Examiner answers: As mentioned above, dropout may not necessarily include deleting neurons / nodes. It deactivate those neurons temporarily and randomly by zeroing those multiplication operations of the inputs or outputs of the neurons. If the input or output of a neuron is multiplied to zero, the neuron is dropped out. Further, Goel also teaches a node is dropped out in [0020] (“The term ‘dropout’ refers herein to dropping out (or adding) nodes/neurons (or other input or output data). In one embodiment, dropout training includes temporarily removing (or adding) one or more nodes/neurons, and temporarily removing (or adding) all incoming and outgoing connections to the removed (or added) nodes/neurons.”). Therefore, Goel teaches the claimed invention.   

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519.  The examiner can normally be reached on Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact 
/S.C./Examiner, Art Unit 2122


/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122