DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 03/07/2018 and 09/04/2019 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1, 3, 4, 7, 8, 11, 13, 14, 16, 18, 19, 22-25 are rejected under 35 U.S.C. 103 as being unpatentable over Engel (US 20170124448 A1) in view of Kendall ("What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?") in view of Hernandez-Lobatoc ("Probabilistic backpropagation for scalable learning of bayesian neural networks") in further view of Goel (US 20160307096 A1).
In regard to claims 1, 11, 16, 24 and 25, Engel teaches: A computer-implemented method for simulating uncertainty in an artificial neural network, the computer-implemented method comprising: (Engel, [0014] "The concurrent uncertainty management system (“management system”) reduces respective uncertainty in life expectancy predictions for products (e.g., aircraft, land vehicles) by performing continuous life expectancy predictions for the products based on uncertainty data generated from various phases of a life cycle of a respective product."; [0045] "the method 500 can also include applying at least one learning model to process the data structures, the at least one learning model includes at least one of a Bayesian learning model or a neural network [an artificial neural network]. This can include overlaying a belief network on cause-and-effect structure in the data structures to propagate dominant uncertainties from their respective sources to respective product parameters of interest.")
simulating, by a computer, (Engel, [0046] "FIG. 6 illustrates an example processing framework for a concurrent uncertainty management system 600 where uncertainty data is processed at each stage of a product lifecycle to extend product lifetime. The system 600 includes one or more computers 602 which execute computer executable instructions from a memory 604 (or memories).")
… from sensor data received from an object operating in a real-world environment (Engel, [0025] "While input distributions refined by learning methods typically apply to the general population sensor data [sensor data] from that particular product / aircraft / component [e.g. an object]."; [0026] "The UDD can be updated from various types of uncertainty regarding components, manufacture, or use. Uncertainty exists in three basic forms: aleatoric, epistemic, and prejudicial."; [0014] "The concurrent uncertainty management system (“management system”) reduces respective uncertainty in life expectancy predictions for products (e.g., aircraft, land vehicles) [e.g. an object] by performing continuous life expectancy predictions..."; [0015] "Thus, the management system can be configured to update in real-time the virtual model to reflect the real-world product based on respective uncertainty descriptor data from the manufacturing and sustainment phase [a real-world environment]."; [0018] "As used herein, the term product can refer to a component, an assembly, a sub-assembly, and so forth that contribute to collectively perform functions of the product which can include vehicles, aircraft, electronic products [e.g. an object], and so forth."; see Fig. 4 "Real-time Health Management", "Intrinsic sensing -- as flown", "Individual vehicles -- as maintained")
… performing, by the computer, an action corresponding to the object sending the sensor data and operating in the real-world environment… (Engel, [0031] "As shown, one or more user interfaces 294 can be provided to interact with the framework 210. This includes receiving change notifications, updating uncertainties, and providing product lifetime extension/reduction updates to management [performing an action] regarding on-going and automated evaluations of uncertainty across each stage of the product lifetime.")
Engel fails to teach, but Kendall teaches: aleatoric uncertainty to measure what the artificial neural network does not understand… (Kndl., p. 1 "There are two major types of uncertainty one can model. Aleatoric uncertainty captures noise inherent in the observations... Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible. We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models [the artificial neural network] for vision tasks."; "In Bayesian modeling, there are two main types of uncertainty one can model [7]. Aleatoric uncertainty captures noise inherent in the observations. This could be for example sensor noise or motion noise, resulting in uncertainty which cannot be reduced even if more data were to be collected.")

    PNG
    media_image1.png
    55
    620
    media_image1.png
    Greyscale
simulating, by the computer, epistemic uncertainty to measure what the artificial neural network does not know (Kndl. p. 1, "On the other hand, epistemic uncertainty accounts for uncertainty in the model – uncertainty which can be explained away given enough data.") by dropping out... from each respective layer of the artificial neural network during forward propagation of the sensor data and (Kndl., p. 3 "Dropout variational inference is a practical approach for approximate inference in large and complex models [15]. This inference is done by training a model with dropout before every weight layer [each respective layer of the artificial neural network], and by also performing dropout at test time to sample from the approximate posterior (stochastic forward passes [forward propagation of the sensor data], referred to as Monte Carlo dropout).") measuring impact of dropped out nodes on the output data of the artificial neural network; and (Kndl., p. 3 "Denoting the random output of the BNN as fW(x)"; "The minimisation objective is given by… with N data points, dropout probability p … σ the model’s observation noise parameter – capturing how much noise we have in the outputs… by marginalising over the (approximate) weights posterior distribution"; The objective function including the dropout rate p is used to measure the performance of the BNN, f is a Bayesian convolutional neural network parametrised by model weights W (p.5).)
…based on the impact of simulating the aleatoric uncertainty and the epistemic uncertainty.(Kndl., p. 4, 3 Combining Aleatoric and Epistemic Uncertainty in One Model, "We develop models that will allow us to study the effects of modeling either aleatoric uncertainty alone, epistemic uncertainty alone, or modeling both uncertainties together in a single model.")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel to incorporate the teachings of Kendall by including uncertainty formulation with new loss functions. Doing so would make the model more robust to noisy data, also give new state-of-the-art results on segmentation and depth regression benchmarks. (Kndl., Abstract "We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks. Further, our explicit uncertainty formulation leads to new loss functions for these tasks, which can be interpreted as learned attenuation. This makes the loss more robust to noisy data, also giving new state-of-the-art results on segmentation and depth regression benchmarks.")
Engel and Kendall fail to teach, but Hernandez-Lobatoc teaches: … by adding random values to edge weights between nodes in the artificial neural network during backpropagation of output data of the artificial neural network and (H.L., 3. Probabilistic backpropagation "In the second phase, the derivatives of the training loss with respect to the weights are propagated back from the output layer [during backpropagation of output data of the artificial neural network] towards the input. These derivatives are used to update the weights using, e.g., stochastic gradient descent with momentum... since the weights [edge weights] are now random... In the second phase, the gradients of this quantity with respect to the means and variances of the approximate Gaussian posterior are propagated back using reverse-mode differentiation as in classic backpropagation. These derivatives are finally used to update the means and variances of the posterior approximation... PBP uses the following property of Gaussian distributions (Minka, 2001). Let f(w) encode an arbitrary likelihood function for the single synaptic weight |w| given some data and let our current beliefs regarding the scalar |w| be captured by a distribution q(w) = N(w|m,v)."; Instead of just w, now single synaptic weights have mean and variance in a distribution, which are values describing the randomness of w.) measuring impact on the output data by the added random values to the edge weights between the nodes; (H.L., p. 3 "The update rule our beliefs about w are updated according to Bayes’ rule..."; "Given a new input vector x*, we can then make predictions for its output y* using the predictive distribution given by..."; Weights are updated to impact the output data of the neural network model.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel and Kendall to incorporate the teachings of Hernandez-Lobatoc by including PBP. Doing so would make the model significantly faster than other techniques, while offering competitive predictive abilities. (H.L., Abstract "A series of experiments on ten real-world datasets show that PBP is significantly faster than other techniques, while offering competitive predictive abilities. Our experiments also show that PBP provides accurate estimates of the posterior variance on the network weights.")
Engel, Kendall and Hernandez-Lobatoc fail to teach, but Goel teaches: a selected node from each respective layer of the artificial neural network (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) for each of one or more training sets (including a set of inputs and corresponding expected outputs) to tune network parameters (number of layers, number of nodes per layer [e.g. a selected node from each respective layer of the artificial neural network], number of training iterations, learning rate, etc.).")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel, Kendall, Hernandez-Lobatoc to incorporate the teachings of Goel by including dropout rate in the iteration steps. Doing so would make the performance of the network is maximized and mitigate against the convergence to poor local minima. (Goel, Abstract "The steps may then be iterated until the generalization performance of the network is maximized."; [0065] 
Claims 11 and 16 recite substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claims 11 and 16. 
In addition, Engel teaches: (claim 11) A computer system for simulating uncertainty in an artificial neural network, the computer system comprising: a bus system; (Engel, [0028] "FIG. 2 illustrates an example of a communications framework and modules [a bus system for communications] for a concurrent uncertainty management system 200. The system 200 includes a concurrent uncertainty management framework 210 which provides physical input gathering reasoning, network connections, and communication of tagged identifiers to update models with respect to uncertainty descriptor data (UDD).")
a storage device connected to the bus system, wherein the storage device stores program instructions; and a processor connected to the bus system, wherein the processor executes the program instructions to (Engel, [0046] "The system 600 includes one or more computers 602 which execute computer executable instructions from a memory 604 (or memories)."; The system includes computers, where processors are inherent.)
(claim 16) A computer program product for simulating uncertainty in an artificial neural network, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising: (Engel, [0046] "The system 600 includes one or more computers 602 which execute computer executable instructions from a memory 604 (or memories).")
Claims 24 and 25 are a broader version of claim 1, therefore the rejection applied to claim 1 also apply to claims 24 and 25.
In regard to claims 3, 13 and 18, Engel, Kendall, Hernandez-Lobatoc and Goel teach: The computer-implemented method of claim 1 further comprising: generating, by the computer, an output of the artificial neural network based on simulating the aleatoric uncertainty and the epistemic uncertainty. (Kndl., p. 4, 3 Combining Aleatoric and Epistemic Uncertainty in One Model, "We develop models that will allow us to study the effects of modeling either aleatoric uncertainty alone, epistemic uncertainty alone, or modeling both uncertainties together in a single model.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel to incorporate the teachings of Kendall by including uncertainty formulation with new loss functions. Doing so would make the model more robust to noisy data, also give new state-of-the-art results on segmentation and depth regression benchmarks.
In regard to claims 4, 14 and 19, Engel, Kendall, Hernandez-Lobatoc and Goel teach: The computer-implemented method of claim 1 further comprising: running, by the computer, the artificial neural network that includes a plurality of hidden layers (H.L. p. 2 2. Probabilistic neural network models "We denote the outputs of the layers by vectors {zl} l = 0.. L, where z0 is the input layer, {zl} l = 1.. L-1 are the hidden units [hidden layers] and zL denotes the output layer...")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel and Kendall to incorporate the teachings of Hernandez-Lobatoc by including PBP. Doing so would make the model significantly faster than other techniques, while offering competitive predictive abilities.
… using… sensor data samples corresponding to the real-world environment; and (Engel, [0025] "While input distributions refined by learning methods typically apply to the general population of components at the fleet level, each individual component has its own unique distribution that is refined using sensor data [sensor data] from that particular product / aircraft / component."; [0015] "Thus, the management system can be configured to update in real-time the virtual model to reflect the [a real-world environment].")

    PNG
    media_image2.png
    58
    588
    media_image2.png
    Greyscale
using labeled... data samples... utilizing, by the computer, an obtained output of the artificial neural network to determine model error based on a delta between a target output and the obtained output. (Kndl., p. 5 "We fix a Gaussian likelihood to model our aleatoric uncertainty. This induces a minimisation objective given labeled output points x: … (7) where D is the number of output pixels yi corresponding to input image x, indexed by i..."; LBNN is the model error;  yi - ^yi is the delta between a target output and the obtained output)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel to incorporate the teachings of Kendall by including uncertainty formulation with new loss functions. Doing so would make the model more robust to noisy data, also give new state-of-the-art results on segmentation and depth regression benchmarks.
In regard to claims 7 and 22, Engel, Kendall, Hernandez-Lobatoc and Goel teach: The computer-implemented method of claim 1 further comprising: receiving, by the computer, the sensor data from the object operating in the real- world environment; (Engel, [0025] "... each individual component has its own unique distribution that is refined using sensor data [sensor data] from that particular product / aircraft / component [e.g. an object]."; [0015] "Thus, the management system can be configured to update in real-time the virtual model to reflect the real-world product based on respective uncertainty descriptor data from the manufacturing and sustainment phase [a real-world environment]."; [0018] "... the product which can include vehicles, aircraft, electronic products [e.g. an object], and so forth."; see Fig. 4 "Real-time Health Management", "Intrinsic sensing -- as flown", "Individual vehicles -- as maintained")
determining, by the computer, an intensity level of the sensor data; and (Engel, [0025] "... each individual component has its own unique distribution that is refined using sensor data from that defect size, for example... Sensors that report defect size [an intensity level of the sensor data] are generally useful when the defect is sufficiently large enough to be accurately quantified.  All sensors have their associated uncertainties.")
determining, by the computer, whether the intensity level of the sensor data is greater than or equal to an intensity level threshold level (Engel, [0025] "Defect detection sensors can declare that they detect or do not detect a flaw at their detection threshold [greater than or equal to an intensity level threshold level].") indicating occurrence of an unknown event. (Engel, [0025] "... each individual component has its own unique distribution [occurrence of an event] that is refined using sensor data from that particular product / aircraft / component... Defect detection sensors (e.g., crack, corrosion, delamination [an unknown event] sensors) generally are used in the incipient stages where defects are approaching the detection threshold of the sensor.")
In regard to claims 8 and 23, Engel, Kendall, Hernandez-Lobatoc and Goel teach: The computer-implemented method of claim 7 further comprising: responsive to the computer determining that the intensity level of the sensor data is greater than or equal to the intensity level threshold level indicating occurrence of an unknown event, (Engel, [0025] "... each individual component has its own unique distribution [occurrence of an event] that is refined using sensor data from that particular product / aircraft / component... Defect detection sensors (e.g., crack, corrosion, delamination [an unknown event] sensors) generally are used in the incipient stages where defects are approaching the detection threshold of the sensor. Defect detection sensors can declare that they detect or do not detect a flaw at their detection threshold [greater than or equal to an intensity level threshold level].")
inputting, by the computer, the sensor data into the artificial neural network (Engel, [0022] "A dynamic Bayesian belief network can be employed by the reasoning layer 130, in one example, and can be overlaid on the cause-and-effect structure to propagate dominant uncertainties from their sources to neural networks."; [0023] "In general, the physical layer 110 accepts any deterministic model... Each input distribution can (optionally) be characterized by hyper-parameter distributions that can be refined through Bayesian learning at the reasoning model layer 130.)"; Sensor data from the physical layer is inputted into the reasoning layer which can be updated using neural network method.) that includes a plurality of hidden layers, each hidden layer including a plurality of nodes; and (Goel, [0041] "Referring now to the drawings in which like numerals represent the same or similar elements and initially to FIG. 1A, a high-level example of a neural network 100... A neural network 100 may include a plurality of neurons/nodes 108... The neural network 100 may include a plurality of layers, including, for example, one or more input layers 102, one or more hidden layers 104, and one or more output layers 106.")
performing, by the computer, Monte Carlo dropout sampling on the sensor data to determine which node in each respective hidden layer in the plurality of hidden layers is to be randomly dropped out to simulate the unknown event. (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) for each of one or more training sets (including a set of inputs and corresponding expected outputs) to tune network parameters (number of layers, number of nodes per layer [e.g. which node in each respective hidden layer], number of training iterations, learning rate, etc.)."; [0061] "An applied dropout rate may again be sampled (e.g., re-sampled) by the sampler 304 to evaluate generalization performance of the current training iteration."; [0065] "Annealing the dropout rate (e.g., viewed as a temperature parameter) using the adjuster 306 is an effective way to mitigate against the poor solutions. Dropout training can be viewed as a Monte Carlo approach that optimizes the expected loss over the ensemble of models formed by all possible dropout masks over node outputs (e.g., a Bayesian a stochastic method for annealed dropout [randomly dropped out] may be employed, and this method may do more than gradually increase the theoretical capacity of the network."; [0070] "In one embodiment, an aggregation may be implemented over an exponential number of models (e.g., ensemble of models), each with a unique dropout mask over the set of weights for a given layer [e.g. each respective hidden layer] of the network."; Engel teaches sensor data and the unknown event.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel, Kendall, Hernandez-Lobatoc to incorporate the teachings of Goel by including dropout rate in the iteration steps. Doing so would make the performance of the network is maximized and mitigate against the convergence to poor local minima.
Claims 2, 5, 6, 12, 15, 17, 20 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Engel in view of Kendall in view of Hernandez-Lobatoc in view of Goel in further view of Gal ("Concrete Dropout").
In regard to claims 2, 12 and 17, Engel teaches: The computer-implemented method of claim 1 further comprising: 
… by using the sensor data corresponding to the real-world environment (Engel, [0025] "... each individual component has its own unique distribution that is refined using sensor data from that particular product/aircraft/component..."; [0015] "Thus, the management system can be configured to update in real-time the virtual model to reflect the real-world product based on respective uncertainty descriptor data from the manufacturing and sustainment phase [a real-world environment]."; see Fig. 4 "Real-time Health Management", "Intrinsic sensing -- as flown", "Individual vehicles -- as maintained")
… an intensity level of the sensor data… (Engel, [0025] "Sensors that report defect size [an intensity level of the sensor data] are generally useful when the defect is sufficiently large enough to be accurately quantified.  All sensors have their associated uncertainties.")

selecting, by the computer, a node to be randomly dropped from each layer of the artificial neural network... for Monte Carlo dropout sampling; and (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) for each of one or more training sets (including a set of inputs and corresponding expected outputs) to tune network parameters (number of layers, number of nodes per layer [e.g. a node from each layer of the artificial neural network], number of training iterations, learning rate, etc.)."; [0061] "An applied dropout rate may again be sampled (e.g., re-sampled) by the sampler 304 to evaluate generalization performance of the current training iteration."; [0065] "Annealing the dropout rate (e.g., viewed as a temperature parameter) using the adjuster 306 is an effective way to mitigate against the poor solutions. Dropout training can be viewed as a Monte Carlo approach that optimizes the expected loss over the ensemble of models formed by all possible dropout masks over node outputs (e.g., a Bayesian objective). In one embodiment, a stochastic method for annealed dropout [randomly dropped] may be employed, and this method may do more than gradually increase the theoretical capacity of the network."; [0070] "In one embodiment, an aggregation may be implemented over an exponential number of models (e.g., ensemble of models), each with a unique dropout mask over the set of weights for a given layer [e.g. each layer] of the network.")
selecting, by the computer, the node to be randomly dropped from that particular layer based on applying... data to the probability density function corresponding to that particular layer. (Goel, [0006] "Recently, it has been shown that neural network performance may be improved by training the neural network by randomly zeroing, or 'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) for each of one or more training sets (including a set of inputs and corresponding expected outputs) to tune network parameters (number of [e.g. dropout rate for the particular hidden layer], number of training iterations, learning rate, etc.)."; [0030] "In an embodiment, the dropout rate may be annealed, and the probability distribution over dropout rates [the probability density function] (and other parameters / hyperparameters) may be adjusted / evolved."; [0067] "For example, if the dropout rate is a function only of the training epoch t, a general formulation according to one embodiment of the present invention may be: p d [t]=p d [t−1] + αt(θ) (4) where 0≦pd[t]≦1 is the dropout probability at epoch t, and αt(θ) is an annealing rate parameter (e.g., dropout rate parameter) that may optionally depend on the current state (or estimate of the state) of auxiliary inputs/parameters θ (Including, for example, p_d[t′] for t′<t). It is noted that the term 'annealing' implies that 0≦αt≦1..."; [0086] "As mentioned above, hyperpararmeters (e.g., the dropout rate) can be reduced or increased so that the next iteration fits the training data more appropriately."; Goel teaches: based on applying input / training data to the probability density function)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel, Kendall, Hernandez-Lobatoc to incorporate the teachings of Goel by including dropout rate in the iteration steps. Doing so would make the performance of the network is maximized and mitigate against the convergence to poor local minima.
Engel, Kendall, Hernandez-Lobatoc and Goel fail to teach, but Gal teaches:

    PNG
    media_image3.png
    176
    308
    media_image3.png
    Greyscale
for each respective layer of the artificial neural network: identifying, by the computer, a probability density function corresponding to a particular layer; and (Gal, p. 1 Abstract "We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout’s discrete masks."; different dropout probabilities in different layers; p.3 3 Concrete Dropout "... dropout is seen as an approximating distribution [a probability density function] to the posterior in a Bayesian neural network with a set of random weight matrices w = {Wl} l = 1..L with L layers [respective layer]"; p.5 Figure 1(d) "(d) Optimised dropout probability values (per layer)."; p. 6 Figure 4 "Figure 4 shows posterior dropout probabilities across different cross validation splits."; "Figure 4: Converged dropout probabilities per layer")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel, Kendall, Hernandez-Lobatoc and Goel to incorporate the teachings of Gal by including automatic tuning of the dropout probability. Doing so would allow for automatic tuning of the dropout in large models, as a result faster experimentation cycles. (Gal, Abstract "Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed.")

    PNG
    media_image4.png
    143
    708
    media_image4.png
    Greyscale

    PNG
    media_image3.png
    176
    308
    media_image3.png
    Greyscale
In regard to claims 5, 15 and 20, Engel, Kendall, Hernandez-Lobatoc, Goel and Gal teach: The computer-implemented method of claim 4 further comprising: inputting, by the computer, the obtained output of the artificial neural network into (Gal, p. 3 "… fw(xi) the neural network’s output [the obtained output of the artificial neural network] on input xi when evaluated with weight matrices realisation.) each different type of probability density function corresponding to each respective hidden layer in the plurality of hidden layers (Gal, p. 1 Abstract "We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout’s discrete masks."; different dropout probabilities in different layers; p.3 3 Concrete Dropout "... dropout is seen as an approximating distribution [different type of probability density function] to the posterior in a Bayesian neural network with a set of random weight matrices w = {Wl} l = 1..L with L layers [respective hidden layer]"; p.5 Figure 1(d) "  (d) Optimised dropout probability values (per layer)."; p. 6 Figure 4 "Figure 4 shows posterior dropout probabilities across different cross validation splits."; "Figure 4: Converged dropout probabilities per layer") to generate edge weight adjustments between nodes based on probabilities o f occurrence of the obtained output in the real- world environment.  (Gal, p. 3 "The optimisation objective that follows from the variational interpretation can be written as: … fw(xi) the neural network’s output on input xi 
    PNG
    media_image5.png
    77
    787
    media_image5.png
    Greyscale
when evaluated with weight matrices realisation, and p(yi|fw(xi)) the model’s likelihood, e.g. a Gaussian with mean fw(xi)."; objective function is used for weight adjustments based on p(yi|fw(xi)) [probabilities of occurrence of the obtained output])
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel, Kendall, Hernandez-Lobatoc and Goel to incorporate the teachings of Gal by including automatic tuning of the dropout probability. Doing so would allow for automatic tuning of the dropout in large models, as a result faster experimentation cycles.
In regard to claims 6 and 21, Engel, Kendall, Hernandez-Lobatoc, Goel and Gal teach: The computer-implemented method of claim 5 further comprising: backpropagating, by the computer, the model error through the artificial neural network to update edge weights between nodes in the plurality of hidden layers based on a level of contribution by each respective node to the model error; and (H.L., p.2 2. Probabilistic neural network models "The NN has L layers, where Vl is the number of hidden units in layer l, and W is the collection of Vl... weight matrices between the fully connected layers."; 3. Probabilistic backpropagation "In the second phase, the derivatives of the training loss  with respect to the weights [e.g. a level of contribution to the model error] are propagated back from the output layer towards the input."; d(loss)/d(node weight), the gradient, is a level of contribution by each respective node to the model error; "In this section we describe a probabilistic alternative to the backpropagation algorithm, which we call probabilistic backpropagation (PBP)... In the second phase, the gradients of this quantity with respect to the means and variances of the approximate Gaussian posterior [e.g. a level of contribution to the model error] are propagated back using reverse-mode differentiation as in classic backpropagation.")

    PNG
    media_image6.png
    115
    420
    media_image6.png
    Greyscale
adding, by the computer, the edge weight adjustments to the updated edge weights (H.L. p. 3, "These derivatives are used to update the weights using, e.g., stochastic gradient descent with momentum… These derivatives are finally used to update the means and variances of the posterior approximation… A common choice is to approximate this posterior with a distribution that has the same form as q. In this case, the parameters of the new Gaussian beliefs qnew(w) = N(w|mnew, vnew) that minimize the the Kullback-Leibler (KL) divergence between s and qnew can then be obtained as a function of m, v... These are the main update equations used by PBP.") between nodes in each respective hidden layer in the plurality of hidden layers (H.L., p.2 2. Probabilistic neural network models "The NN has L layers, where Vl is the number of hidden units in layer l [nodes in respective layers], and W is the collection of Vl... weight matrices between the fully connected layers.") 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel and Kendall to incorporate the teachings of Hernandez-Lobatoc by including PBP. Doing so would make the model significantly faster than other techniques, while offering competitive predictive abilities.
to simulate the aleatoric uncertainty. (Kndl., p. 4, 3 Combining Aleatoric and Epistemic Uncertainty in One Model, "We develop models that will allow us to study the effects of modeling either aleatoric uncertainty alone, epistemic uncertainty alone, or modeling both uncertainties together in a single model.")
Claims 9, 10 are rejected under 35 U.S.C. 103 as being unpatentable over Engel in view of Kendall in view of Hernandez-Lobatoc in view of Goel in further view of Choi ("Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling").
In regard to claim 9, Engel, Kendall, Hernandez-Lobatoc, Goel fail to teach, but Choi teaches: The computer-implemented method of claim 8 further comprising: selecting, by the computer, a hidden layer in the plurality of hidden layers; and identifying, by the computer, a probability density function corresponding to the selected hidden layer in the plurality of hidden layers  (Choi, III. PRELIMINARIES, A. Uncertainty Acquisition in Deep Learning, Figure 1; B. Mixture Density Network "A mixture density network (MDN) was first proposed in [7] where the output of a neural network is composed of 
    PNG
    media_image7.png
    88
    294
    media_image7.png
    Greyscale
parameters constructing a Gaussian mixture model (GMM)... where θ = {πj , μj , and Σj} j = 1 .. K is a set of parameters of a GMM, mixture probabilities, mixture means, and mixture variances, respectively."; G aussian 
    PNG
    media_image8.png
    314
    332
    media_image8.png
    Greyscale
distribution is a probability density function corresponding to each of the layers, including each hidden layer in the plurality of hidden layers.) that models an output of the artificial neural network. (Choi, Fig. 1:  A mixture density network (K = 3) with two hidden layers where the output of the network is decomposed into π^ , μ^ , and Σ^")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel, Kendall, Hernandez-Lobatoc and Goel to incorporate the teachings of Choi by including a mixture density network. Doing so would provide an uncertainty-aware learning which outperforms other compared methods in terms of safety. (Choi, "In this paper, we propose an uncertainty-aware learning from demonstration method by presenting a novel uncertainty estimation method utilizing a mixture density network appropriate for modeling complex and noisy human behaviors... The proposed uncertainty-aware learning from demonstration method outperforms other compared methods in terms of safety using a complex real-world driving dataset.")
In regard to claim 10, Engel, Kendall, Hernandez-Lobatoc, Goel and Choi teach: The computer-implemented method of claim 9 further comprising: selecting, by the computer, a node within the selected hidden layer to be randomly dropped out (Goel, [0006] "'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network (e.g., dropout training) for each of one or more training sets (including a set of inputs and corresponding expected outputs) to tune network parameters (number of layers, number of nodes per layer [e.g.  a node within the selected hidden layer], number of training iterations, learning rate, etc.)."; [0070] "In one embodiment, an aggregation may be implemented over an exponential number of models (e.g., ensemble of models), each with a unique dropout mask over the set of weights for a given layer [e.g. the selected hidden layer] of the network.") based on applying the intensity level of the sensor data  (Engel, [0025] "State awareness sensors at the physical layer 110 can provide either defect detection and/or defect size, for example... Sensors that report defect size [an intensity level of the sensor data] are generally useful when the defect is sufficiently large enough to be accurately quantified.  All sensors have their associated uncertainties.") to the identified probability density function; and (Goel, [0030] "In an embodiment, the dropout rate may be annealed, and the probability distribution over dropout rates (and other parameters / hyperparameters) may be adjusted / evolved."; [0067] "For example, if the dropout rate is a function only of the training epoch t, a general formulation according to one embodiment of the present invention may be: p d [t]=p d [t−1] + αt(θ) (4) where 0≦pd[t]≦1 is the dropout probability at epoch t, and αt(θ) is an annealing rate parameter (e.g., dropout rate parameter) that may optionally depend on the current state (or estimate of the state) of auxiliary inputs/parameters θ (Including, for example, p_d[t′] for t′<t). It is noted that the term 'annealing' implies that 0≦αt≦1, , but using variable rate annealing schedules 311 to determine the dropout rate for successive iterations (e.g., instead of a constant or static dropout rate) that increase (or decrease) the dropout rate to be used for the next iteration (e.g. sample the dropout rate from a current distribution estimate) may also be utilized."; [0086] "As mentioned above, hyperpararmeters (e.g., the dropout rate) can be reduced or increased so that the next iteration fits the training data more appropriately."; Goel teaches: based on applying current state / level of input data to the probability density function; and Engel teaches: the intensity level of the sensor data.)
dropping out, by the computer, the selected node within the selected hidden layer (Goel, [0006] "'dropping out' a fixed percentage of the inputs or outputs of a given node or layer in the neural network... to tune network parameters (number of layers, number of nodes per layer [e.g.  a node within the selected hidden layer], number of training iterations, learning rate, etc.).") 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Engel, Kendall, Hernandez-Lobatoc to incorporate the teachings of Goel by including dropout rate in the iteration steps. Doing so would make the performance of the network is maximized and mitigate against the convergence to poor local minima.
to simulate epistemic uncertainty associated with the unknown event. (Engel, [0025] "Defect detection sensors (e.g., crack, corrosion, delamination [an unknown event] sensors) generally are used in the incipient stages where defects are approaching the detection threshold of the sensor."; [0026] "The UDD can be updated from various types of uncertainty regarding components, manufacture, or use. Uncertainty exists in three basic forms: aleatoric, epistemic, and prejudicial.")
Conclusion
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.C./Examiner, Art Unit 2122

/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126