Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to 35 U.S.C. § 102 and § 103 have been considered but are moot because the arguments are directed to amended limitations that have not been previously examined.
Claim Objections
Claim 8 is objected to because of the following informalities:  “Fischer information matrix” should read “Fisher information matrix”.  Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 6 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 6 recites the limitation "the labeled" in line 2.  There is insufficient antecedent basis for this limitation in the claim. For the purposes of prosecution, examiner has interpreted "the labeled" as "the labeled objects".

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1, 3-6 and 10, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Burbidge et al. (“Active Learning for Regression based on Query by Committee”, hereinafter Burbidge) in view of Friedman (“The Bayesian Structural EM Algorithm”) and in view of Gal et al. (“Deep Bayesian Active Learning with Image Data”, hereinafter Gal)
Regarding Claim 1
Burbidge teaches 
A method comprising: 
- initializing a plurality of deep neural networks by training the plurality of deep neural networks using different sets of labeled objects, ([§ 2] “Ten neural networks were each trained on random sub-samples of half the data.”; “random sub-samples” reads on “different sets of labeled objects”)
- providing [[millions of]] unlabeled object as input to each of a plurality of deep neural networks; ([§ 2] “It is assumed that a set of unlabelled inputs is provided and we wish to query the labels of as few inputs as possible whilst minimizing the generalization error.” “Ten neural networks were each trained on random sub-samples of half the data. An unlabelled input with maximal variance over the committee is then selected.”)
- for each unlabeled object: obtaining a plurality of predictions for the unlabeled object, each prediction being obtained from one of the plurality of deep neural networks; ([§ 2] “An unlabelled input with maximal variance over the committee is then selected.” “They defined the ambiguity at an input point x as the variance in the predictions of the committee members:”)
- determining whether the plurality of predictions satisfy a diversity metric; ([§ 2] “They defined the ambiguity at an input point x as the variance in the predictions of the committee members:” “An unlabelled input with maximal variance over the committee is then selected.”; “ambiguity” reads on “diversity metric”)
- identifying the unlabeled object as an informative object when the predictions satisfy the diversity metric. ([§ 1] “In query learning, our goal is to provide criteria that a learning algorithm can employ to improve its performance by actively selecting data that are most informative” [§ 2] “An unlabelled input with maximal variance over the committee is then selected.”; “maximal variance” reads on “satisfy the diversity metric”)
- obtaining labels for the informative object ([§ 1] “any input on which the committee members make opposite predictions causes maximal disagreement and its label is queried.”; “input on which the committee members make opposite predictions” reads on “informative object”)
- retraining the plurality of deep neural networks using the labels and the informative object ([§ 2] “actively selecting instantiations of the input variables x that should be labelled and incorporated into the training set” “They propose to query at each step the label of the input for which the ambiguity is maximal,” “The networks were trained on the same set of labelled examples, starting from one labelled example and adding one labelled example at a time.”)
Burbidge does not distinctly disclose
- wherein the plurality of deep neural networks together represent an approximation to a Bayesian posterior after the training
	However, Friedman teaches
- wherein the plurality of deep neural networks together represent an approximation to a Bayesian posterior after the training ([§ 6] “we might attempt to directly follow the basic Bayesian principle as formulated in Eq. (1), and perform Bayesian model averaging. In this approach, members of the committee are weighted by their posterior probability.”; “Bayesian model averaging” reads on “approximation to a Bayesian posterior” and  “members of the committee” reads on “the plurality of deep neural networks” )
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge with Bayesian approximation of Friedman to achieve efficiency in approximation computation. ([§ 2.4] “we can efficiently compute approximations for these predictions (e.g., using the MAP approximation).”)
	The combination of Burbidge and Friedman does not appear to distinctly disclose
- wherein each deep neural network has at least a million parameters
- providing millions of unlabeled objects as input
	However, Gal teaches
- wherein each deep neural network has at least a million parameters ([§ 1] “New techniques such as dropout (Hintonet al., 2012; Srivastava et al., 2014) are used extensively to regularise these huge models, which often contain millions of parameters”)
- providing millions of unlabeled objects as input ([§ 5.4] “These models make further use of a (very) large unlabeled set of 49K images, and a large validation set of 5K-10K labelled images to tune model hyper-parameters and model structure (Rasmus et al., 2015).”; it is obvious that the system with (very) large unlabeled set is capable of dealing with millions of objects as input )
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge and Friedman with large data processing of Gal to achieve improvement of data processing in active learning. ([§ 2.4] “Taking advantage of specialised models such as Bayesian convolutional neural networks, we demonstrate our active learning techniques with image data, obtaining a signiﬁcant improvement on existing active learning approaches.”)

Regarding Claim 3
The combination of Burbidge, Friedman and Gal teaches all of the limitations of claim 1 as cited above and Burbidge further teaches: 
- wherein the steps of providing the millions of unlabeled objects, obtaining the plurality of predictions, determining whether the plurality of predictions satisfy a diversity metric, and identifying the unlabeled object as informative are iterated until convergence in the plurality of deep neural networks is reached. ( [§ 3] “The ambiguity (A) selection strategy selects x to maximize a(x), x is chosen from a pool of m = 1000 unlabelled examples. Results are averaged over 1000 runs.”;[§ 2] “An unlabelled input with maximal variance over the committee is then selected.”; “1000 runs” reads on “iterated”; “maximize a(x)” reads on convergence; “over the committee” reads on “the plurality of deep neural networks”)

Regarding Claim 4
The combination of Burbidge, Friedman and Gal teaches all of the limitations of claim 3 as cited above and Burbidge further teaches: 
- wherein convergence is reached after a predetermined number of iterations. ([§ 3] “The ambiguity (A) selection strategy selects x to maximize a(x), x is chosen from a pool of m = 1000 unlabelled examples. Results are averaged over 1000 runs.”)

Regarding Claim 5
The combination of Burbidge, Friedman and Gal teaches all of the limitations of claim 3 as cited above and Burbidge further teaches: 
- wherein convergence is reached when no unlabeled objects have a plurality of predictions that satisfy the diversity metric. ([§ 2] “Active learning was shown to require fewer queries than passive learning to reach agreement among the committee members.”; “reach agreement among the committee members” reads on “no unlabeled objects have a plurality of predictions that satisfy the diversity metric” because “agreement” means every classifier makes an identical prediction which makes the zero variance and that does not satisfy the diversity metric. )

Regarding Claim 6
The combination of Burbidge, Friedman and Gal teaches all of the limitations of claim 1 as cited above and Burbidge further teaches: 
- wherein the different sets of labeled objects apply different weights to the labeled ([§ 2] “The disagreement in the predictions of the individual committee members arises from the diﬀering random initializations of the network weights.”; “random initializations” reads on “the different sets of labeled objects”)

Regarding Claim 10
Burbidge teaches 
A computer-readable medium storing a deep neural network([§1] “machine learning algorithm” implies “neural network” code stored in a  computer-readable storage medium) trained by
- initializing a committee of deep neural networks ([§2]“The committee consists of k networks and the output of network α on input x is yα(x)”, [§3] “a committee of k=5 learners is maintained”) by training the committee of deep neural networks using different sets of labeled training objects; ([§1] “An alternative approach is to use different subsets of the data, as in query by bagging and query by boosting”, [§3]“ For the active strategy, a committee of k = 5 learners is maintained. Each is trained on a subset of the labelled data by leaving out disjoint subsets of size [n/k] where n is the number of labelled data.”; “subset … leaving out disjoint subsets” reads on “different sets” because if there are n=1000 data, each learner is trained with a subset of size 200 (n=1000/k=5) samples which is exclusive from other subsets;)
- iteratively training the deep neural networks of the committee until convergence ([§2] “it is computationally more efficient to draw m candidate points from q(x) at each iteration and choose the best x from these”, “They propose to query at each step the label of the input for which the ambiguity is maximal”, “The networks were trained on the same set of labelled examples, starting from one labelled example and adding one labelled example at a time.”; “the best x” and “the ambiguity is maximal” reads on “convergence” ) by: 
- identifying a set of informative objects, by providing [[millions of]] unlabeled objects to the committee and selecting, as the set of informative objects, the unlabeled objects with highest diversity in the predictions obtained from the deep neural networks in the committee, ([§ 2] “An unlabelled input with maximal variance over the committee is then selected.” “They defined the ambiguity at an input point x as the variance in the predictions of the committee members:”; “maximal variance” reads on “highest diversity”)
 “They propose to query at each step the label of the input for which the ambiguity is maximal,”
- obtaining, for each informative object in the set of informative objects, a label for the informative object, and ([§ 2] “They propose to query at each step the label of the input for which the ambiguity is maximal”; “query .. the label” reads on “obtaining labels”)
- retraining the deep neural networks in the committee using the labels for the informative objects; (“actively selecting instantiations of the input variables x that should be labelled and incorporated into the training set”, “The networks were trained on the same set of labelled examples, starting from one labelled example and adding one labelled example at a time.”).
Burbidge does not distinctly disclose
- wherein the committee of deep neural networks together represent an approximation to a Bayesian posterior after the training
	However, Friedman teaches
- wherein the committee of deep neural networks together represent an approximation to a Bayesian posterior after the training ([§ 6] “we might attempt to directly follow the basic Bayesian principle as formulated in Eq. (1), and perform Bayesian model averaging. In this approach, members of the committee are weighted by their posterior probability.”; “Bayesian model averaging” reads on “approximation to a Bayesian posterior” and  “members of the committee” reads on “the plurality of deep neural networks” )
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge with Bayesian approximation of Friedman to achieve efficiency in approximation computation. ([§ 2.4] “we can efficiently compute approximations for these predictions (e.g., using the MAP approximation).”)
	The combination of Burbidge and Friedman does not appear to distinctly disclose
- wherein each deep neural network has at least a million parameters
- providing millions of unlabeled objects 
	However, Gal teaches
- wherein each deep neural network has at least a million parameters ([§ 1] “New techniques such as dropout (Hintonet al., 2012; Srivastava et al., 2014) are used extensively to regularise these huge models, which often contain millions of parameters”)
- providing millions of unlabeled objects ([§ 5.4] “These models make further use of a (very) large unlabeled set of 49K images, and a large validation set of 5K-10K labelled images to tune model hyper-parameters and model structure (Rasmus et al., 2015).”; it is obvious that the system with (very) large unlabeled set is capable of dealing with millions of objects as input )
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge and Friedman with large data processing of Gal to achieve improvement of data processing in active learning. ([§ 2.4] “Taking advantage of specialised models such as Bayesian convolutional neural networks, we demonstrate our active learning techniques with image data, obtaining a signiﬁcant improvement on existing active learning approaches.”)

Regarding Claim 14
The combination of Burbidge, Friedman and Gal teaches all of the limitations of claim 10 as cited above and Burbidge further teaches: 
- wherein the different sets of labeled training objects differ in weights assigned to the labeled training objects. ([§ 2] “The disagreement in the predictions of the individual committee members arises from the diﬀering random initializations of the network weights.”; “random initializations” reads on “the different sets of labeled objects”)

Claims 2, 10, 13-14, 16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Burbidge in view of Friedman in view of Gal and further in view of Yankov et al. (US 20120095943 A1, hereinafter Yankov)

Regarding Claim 2
The combination of Burbidge, Friedman and Gal teaches all of the limitations of claim 1 as cited above but does not distinctly disclose
- providing the informative object to a human rater to provide the label; 
	However, Yankov teaches
- - providing the informative object to a human rater to provide the label ([0005] “Provided that there are a large number of unlabeled examples, it then selects an unlabeled example that it believes is “informative” and will improve the classification performance the most if its label is revealed. The example is then labeled by human editors and added to the initial training set.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman and Gal with human labeling of Yankov to achieve precise labels thereby improving the accuracy of classifiers. ([Abstract] “and use editorially-labeled versions of the examples of the active set to re-train the classifiers, thereby improving the accuracy of at least some of the classifiers.”)

Regarding Claim 13
Burbidge, Friedman and Gal teaches all of the limitations of claim 10 as cited above as cited above but does not distinctly disclose
- wherein for each iteration the set of informative objects is bounded by a predetermined quantity
	However, Yankov teaches
- wherein for each iteration the set of informative objects is bounded by a predetermined quantity. ([0041] “At block 620, therefore, the computer 110 selects the most-informative l examples (say the most-informative l=1000 examples) to be labeled at each iteration using an improved algorithm that selects the most-informative examples for the classifiers in the taxonomy as a whole.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman and Gal with human labeling of Yankov to achieve precise labels thereby improving the accuracy of classifiers. ([Abstract] “and use editorially-labeled versions of the examples of the active set to re-train the classifiers, thereby improving the accuracy of at least some of the classifiers.”)


Regarding Claim 16
Burbidge teaches
- generating, from a set of labeled objects, a plurality of training sets, each training set differing from the other training sets in the plurality of training sets; ([§3]“Each is trained on a subset of the labelled data by leaving out disjoint subsets of size n/k, where n is the number of labelled data”; “subset … leaving out disjoint subsets” reads on “different sets”)
- assigning each deep neural network in a committee of deep neural networks a respective training set, [[each deep neural network having at least a million parameters]] and is assigned a different training set of the plurality of training sets; ([§2]“The committee consists of k networks and the output of network α on input x is yα(x)”, [§3]“ a committee of k = 5 learners is maintained. Each is trained on a subset of the labelled data by leaving out disjoint subsets of size n/k, where n is the number of labelled data”)
- initializing the committee by training the deep neural networks using respective training sets, [[wherein the committee of deep neural networks together represent an approximation to a Bayesian posterior after the training]];  ([§2]“The committee consists of k networks and the output of network α on input x is yα(x)”, [§3]“ a committee of k = 5 learners is maintained. Each is trained on a subset of the labelled data by leaving out disjoint subsets of size n/k, where n is the number of labelled data”)
- iteratively training the deep neural networks of the committee until convergence ([§2] “it is computationally more efficient to draw m candidate points from q(x) at each iteration and choose the best x from these”, “They propose to query at each step the label of the input for which the ambiguity is maximal”, “The networks were trained on the same set of labelled examples, starting from one labelled example and adding one labelled example at a time.”; “the best x” and “the ambiguity is maximal” reads on “convergence” ) by: 
- obtaining, for each unlabeled object from [[millions of]] unlabeled objects, a prediction from each deep neural network in the committee ([§ 2] “An unlabelled input with maximal variance over the committee is then selected.” “They defined the ambiguity at an input point x as the variance in the predictions of the committee members:”)
- identifying informative objects as the unlabeled objects with highest diversity in predictions from the committee of deep neural networks, ([§ 2] “An unlabelled input with maximal variance over the committee is then selected.” “They defined the ambiguity at an input point x as the variance in the predictions of the committee members:”; “maximal variance” reads on “highest diversity”)
- obtaining a respective label for informative object of the informative objects ([§ 2] “They propose to query at each step the label of the input for which the ambiguity is maximal”; “query .. the label” reads on “obtaining labels”)
- retraining the deep neural networks with the respective labels and the informative objects; (“actively selecting instantiations of the input variables x that should be labelled and incorporated into the training set”, “The networks were trained on the same set of labelled examples, starting from one labelled example and adding one labelled example at a time.”).
Burbidge does not distinctly disclose
- wherein the committee of deep neural networks together represent an approximation to a Bayesian posterior after the training
	However, Friedman teaches
- wherein the committee of deep neural networks together represent an approximation to a Bayesian posterior after the training ([§ 6] “we might attempt to directly follow the basic Bayesian principle as formulated in Eq. (1), and perform Bayesian model averaging. In this approach, members of the committee are weighted by their posterior probability.”; “Bayesian model averaging” reads on “approximation to a Bayesian posterior” and  “members of the committee” reads on “the plurality of deep neural networks” )
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge with Bayesian approximation of Friedman to achieve efficiency in approximation computation. ([§ 2.4] “we can efficiently compute approximations for these predictions (e.g., using the MAP approximation).”)
	The combination of Burbidge and Friedman does not appear to distinctly disclose
- each deep neural network has at least a million parameters
- from millions of unlabeled objects 
	However, Gal teaches
- wherein each deep neural network has at least a million parameters ([§ 1] “New techniques such as dropout (Hintonet al., 2012; Srivastava et al., 2014) are used extensively to regularise these huge models, which often contain millions of parameters”)
- from millions of unlabeled objects ([§ 5.4] “These models make further use of a (very) large unlabeled set of 49K images, and a large validation set of 5K-10K labelled images to tune model hyper-parameters and model structure (Rasmus et al., 2015).”; it is obvious that the system with (very) large unlabeled set is capable of dealing with millions of objects as input )
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge and Friedman with large data processing of Gal to achieve improvement of data processing in active learning. ([§ 2.4] “Taking advantage of specialised models such as Bayesian convolutional neural networks, we demonstrate our active learning techniques with image data, obtaining a signiﬁcant improvement on existing active learning approaches.”)
	The combination of Burbidge, Friedman and Gal does not distinctly disclose
- using one of the deep neural networks to make predictions for unlabeled objects.	However, Yankov teaches
- using one of the deep neural networks to make predictions for unlabeled objects. ([0043] “The classifiers are thus ready to be used on an unlabeled dataset for testing or to perform automated labeling for any number of applications, as discussed previously, depending on the types of classifiers that were trained.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman and Gal with using updated classifiers of Yankov to make predictions by a retrained learner thereby improving the accuracy of classifiers. ( [Abstract] “and use editorially-labeled versions of the examples of the active set to re-train the classifiers, thereby improving the accuracy of at least some of the classifiers.”)

Regarding Claim 18
The combination of Burbidge, Friedman, Gal and Yankov teaches all of the limitations of claim 16 as cited above and Burbidge further teaches: 
- wherein generating the plurality of training sets includes randomized reweighting labeled objects in the set of labeled objects. ([§ 2] “The disagreement in the predictions of the individual committee members arises from the diﬀering random initializations of the network weights.”; “random initializations of the network weights” reads on “randomized reweighting labeled objects”)

Regarding Claim 19
The combination of Burbidge, Friedman, Gal and Yankov teaches all of the limitations of claim 16 as cited above and Yankov further teaches: 
- wherein obtaining the respective label for each informative object includes:  receiving a label from each of a plurality of human raters; and aggregating the labels. ([0005] “The example is then labeled by human editors and added to the initial training set.”; “added” reads on “aggregating”)
	Same motivation as claim 16.

Regarding Claim 20
The combination of Burbidge, Friedman, Gal and Yankov teaches all of the limitations of claim 16 as cited above and Burbidge further teaches: 
- generating the plurality of training sets includes randomized subsampling of the set of labeled objects. ([§2] “Ten neural networks were each trained on random sub-samples of half the data. ”; “trained” implies the data are labeled.)

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Burbidge in view of Friedman in view of Gal and in view of Zhu et al. (“Bayesian Neural Networks Based Bootstrap Aggregating for Tropical Cyclone Tracks Prediction in South China Sea”, hereinafter Zhu)

Regarding Claim 7
The combination of Burbidge, Friedman and Gal teaches all of the limitations of claim 1 as cited above but does not distinctly disclose 
- initializing the plurality of deep neural networks using Bayesian bootstrapping..
	However, Zhu teaches
- initializing the plurality of deep neural networks using Bayesian bootstrapping. ([Abstract] “The model proposed in this study is a Bayesian Neural Network (BNN) based committee machine using bagging (bootstrap aggregating). Two-layered Bayesian neural networks are employed as committee members in the committee machine. [§3.2 par. 3] “So, the process of bootstrap sampling is independently executed for 30 times to produce a different training set for each committee member.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman and Gal with Bayesian bootstrapping of Zhu to achieve well-distributed sampling thereby increasing accuracy of prediction. ( [0005] “The goal is to minimize the time that the subject matter expert(s) spend in training the system” [§3.3] “the ensemble performance of members is generally superior to individual committee member in prediction accuracy.” )

Regarding Claim 15
The combination of Burbidge, Friedman and Gal teaches all of the limitations of claim 10 as cited above but does not distinctly disclose 
- wherein the different sets of labeled training objects are generated via Bayesian bootstrapping.
	However, Zhu teaches
- i wherein the different sets of labeled training objects are generated via Bayesian bootstrapping.. ([Abstract] “The model proposed in this study is a Bayesian Neural Network (BNN) based committee machine using bagging (bootstrap aggregating). Two-layered Bayesian neural networks are employed as committee members in the committee machine. [§3.2 par. 3] “So, the process of bootstrap sampling is independently executed for 30 times to produce a different training set for each committee member.”)
Same motivation as claim 7.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Burbidge in view of Friedman in view of Gal and in view of Houlsby et al. (“Bayesian Active Learning for Classification and Preference Learning”, hereinafter Houlsby)

Regarding Claim 8
The combination of Burbidge, Friedman and Gal teaches all of the limitations of claim 1 as cited above but does not distinctly disclose 
- initializing the plurality of deep neural networks using a Laplace approximation.
	However, Houlsby teaches
- initializing the plurality of deep neural networks using a Laplace approximation. ([§Conclusion] “This allows us to choose from a whole range of approximate inference methods, including EP, the Laplace approximation, ADF or even sparse online learning, and thereby make the trade off between computational complexity and accuracy.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman and Gal with Laplace approximation of Houlsby to achieve well-distributed sampling thereby achieving better prediction performance. ( [Abstract] “Our experimental performance compares favourably to many popular active learning algorithms, and has equal or lower computational complexity”)

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Burbidge in view of Friedman in view of Gal in view of Houlsby in view of Povey et al. (“PARALLEL TRAINING OF DNNS WITH NATURAL GRADIENT AND PARAMETER AVERAGING”, hereinafter Povey) and further in view of Lira et al. (“Combining Multiple Artificial Neural Networks Using Random Committee to Decide upon Electrical Disturbance Classification”, hereinafter Lira)
Regarding Claim 9
The combination of Burbidge, Friedman, Gal and Houlsby teaches all of the limitations of claim 8 as cited above but does not distinctly disclose 
- using the Laplace approximation comprises using a Fischer information matrix and a source neural network to draw random neural network samples with randomized parameters, wherein each random neural network sample is a deep neural network of the plurality of deep neural networks
	However, Povey teaches
- using the Laplace approximation comprises using a Fischer information matrix and [[a source neural network to draw random neural network samples]] with randomized parameters, wherein each random neural network sample is a deep neural network of the plurality of deep neural networks ([§ 5.2] “It is necessary for the Fisher matrix to be randomly chosen independent of the identity of the current sample.” [§ C.7] “we noticed that sometimes on an outer iteration immediately following the random initialization of parameters (including the first outer iteration)”; [Abstract] “We describe the neural-network training framework used in the Kaldi speech recognition toolkit, which is geared towards training DNNs with large amounts of training data using multiple GPU-equipped or multi-core machines”; “DNNs” reads on “the plurality of deep neural networks”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman, Gal and Houlsby with Fisher information matrix of Povey to achieve improvement in convergence of stochastic gradient descent. ( [Abstract] “an approximate and efficient implementation of Natural Gradient for Stochastic Gradient Descent (NG-SGD), which seems to allow our  periodic averaging method to work well, as well as substantially improving the convergence of SGD on a single machine.” )
The combination of Burbidge, Friedman, Gal, Houlsby and Povey does not appear to distinctly disclose
- a source neural network to draw random neural network samples ([Abstract] “Network combination was formed using random committee which builds an ensemble of randomized base classifiers.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman, Gal,  Houlsby and Povey with random committee of Lira to achieve improvement in classification accuracy. ( [Abstract] “Experimental results with real data indicate that the random committee is clearly an effective way to improve disturbance classification accuracy when compared with the simple average and the individual models.” )

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Burbidge in view of Friedman in view of Gal in view of Povey and further in view of Lira.
Regarding Claim 11
The combination of Burbidge, Friedman and Gal and teaches all of the limitations of claim 10 as cited above but does not distinctly disclose 
- wherein initializing the committee includes using a Fischer information matrix and a source neural network and to draw random neural network samples with randomized parameters, wherein each random neural network sample is a deep neural network of the committee	However, Povey teaches
- wherein initializing the committee includes using a Fischer information matrix and [[a source neural network and to draw random neural network samples]] with randomized parameters, wherein each random neural network sample is a deep neural network of the committee ([§ 5.2] “It is necessary for the Fisher matrix to be randomly chosen independent of the identity of the current sample.” [§ C.7] “we noticed that sometimes on an outer iteration immediately following the random initialization of parameters (including the first outer iteration)”; [Abstract] “We describe the neural-network training framework used in the Kaldi speech recognition toolkit, which is geared towards training DNNs with large amounts of training data using multiple GPU-equipped or multi-core machines”; “DNNs” reads on “committee”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman and Gal with Fisher information matrix of Povey to achieve improvement in convergence of stochastic gradient descent. ( [Abstract] “an approximate and efficient implementation of Natural Gradient for Stochastic Gradient Descent (NG-SGD), which seems to allow our  periodic averaging method to work well, as well as substantially improving the convergence of SGD on a single machine.” )
The combination of Burbidge, Friedman, Gal and Povey does not appear to distinctly disclose
- a source neural network to draw random neural network samples ([Abstract] “Network combination was formed using random committee which builds an ensemble of randomized base classifiers.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman, Gal,  and Povey with random committee of Lira to achieve improvement in classification accuracy. ( [Abstract] “Experimental results with real data indicate that the random committee is clearly an effective way to improve disturbance classification accuracy when compared with the simple average and the individual models.” )

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Burbidge in view of Yankov and further in view of Johnson.

Regarding Claim 12
The combination of Burbidge, Friedman and Gal teaches all of the limitations of claim 10 as cited above but does not distinctly disclose 
- wherein convergence is reached when diversity in the predictions of the deep neural networks fails to meet a diversity threshold.
	However, Johnson teaches
- wherein convergence is reached when diversity in the predictions of the deep neural networks fails to meet a diversity threshold. ([0007] “In another example, the apparatus can be operable to implement a stopping criteria process that is based on a Kappa agreement (equations 1 and 2, or technique 3) or a more general process (technique 3′). [0057] “The Kappa agreement measures how much these two hyperplanes h and h′ agree on their prediction of labels on a carefully chosen test set.”; “Kappa agreement” reads on “fails to meet a diversity threshold” since Kappa agreement means the predictions from classifiers are not diverse.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman and Gal with diversity threshold of Johnson to minimize training time thereby increasing efficiency of classifiers. (Johnson [0005] “The goal is to minimize the time that the subject matter expert(s) spend in training the system” [0044] “A study conducted by [reference 34] shows that TAR methods can be more effective and efficient than traditional e-discovery practice.” )

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Burbidge in view of Friedman in view of Gal in view of Yankov and further in view of Zhu.

Regarding Claim 17
Burbidge, Friedman, Gal and Yankov teaches all of the limitations of claim 16 as cited above but does not distinctly disclose 
- wherein generating the plurality of training sets includes generating the plurality of training sets via Bayesian bootstrapping
	However, Zhu teaches
- wherein generating the plurality of training sets includes generating the plurality of training sets via Bayesian bootstrapping ([Abstract] “The model proposed in this study is a Bayesian Neural Network (BNN) based committee machine using bagging (bootstrap aggregating).” [§3.2 par. 3] “So, the process of bootstrap sampling is independently executed for 30 times to produce a different training set for each committee member.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the active learning system of Burbidge, Friedman, Gal and Yankov with Bayesian bootstrapping of Zhu to achieve well-distributed sampling thereby increasing accuracy of prediction. ( [0005] “The goal is to minimize the time that the subject matter expert(s) spend in training the system” [§3.3] “the ensemble performance of members is generally superior to individual committee member in prediction accuracy.” )

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUNG WON LEE whose telephone number is (571)272-8508. The examiner can normally be reached Mon-Fri 0730-1730.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MICHAEL HUNTLEY can be reached on 303-297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SUNG W LEE/Examiner, Art Unit 2129                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129