DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/03/2021 has been entered.
 
Response to Amendments
The amendments filed 05/03/2021 have been entered. Claims 1-20 remain pending in the application. 
 
Response to Arguments
Applicant's arguments, merely due to the current amendments, with respect to the rejection under 35 U.S.C 102 filed 05/03/2021 have been fully considered and are persuasive. However, upon further consideration, new ground(s) of rejection have been made under 35 U.S.C. 103. 
Applicant's arguments, with respect to 35 U.S.C 103 filed 05/03/2021 have been fully considered but are not persuasive. 
	The amendments filed 05/03/2021 contain subject matter than has not been previously presented. Therefore, applicant’s arguments regarding such amendments are rendered moot. The examiner refers to the rejection under 35 U.S.C. 103 for more details.  
Claim Objections
Claims 1-20 are objected to because of the following informalities:

The claims appear to use the terms “feedback”, “received feedback”, and “stored received feedback” interchangeably. While it is believed that each instance is definite, the wording chosen is inconsistent which could lead to possible confusion. 
For example, Claim 4 recites “…and based at least in part on the feedback…” Again, while this language appears definite, it is unclear if “the feedback” refers to “the received feedback” and/or the “stored received feedback”. The examiner notes that each and every instance of “feedback”, “received feedback”, and “stored received feedback” will be interpreted as encompassing the same or otherwise equivalent subject matter. 
 Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder 
1. “a machine learning system configured to…” 
In claim 2
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 2 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claim limitation “a machine learning system” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. 
As described above, “A machine learning system” passes all three prongs of the 112(f) test and thus invokes interpretation under 112(f). 
However, no sufficient structure nor corresponding algorithm has be identified or clearly linked to the claimed “a machine learning system configured to…” 
Specifically, the claim element “machine learning system” appears to be only disclosed in the background of the as-filed specification. Of particular note is paragraph [0002]-[0003] which, at least in part, recites: 
“Various types of machine learning system are known including neural networks, support vector machines, random decision forests and others. 
[0003] Machine learning systems are often trained in an offline training…Online training refers to training which occurs together with or as a part of test time operation of a machine learning system…”

However, as can be seen there is no description of what structure or corresponding algorithm performs the functions of Claim 2. 
sic]…” 
The MPEP is explicit about disclosing general algorithms in terms of the requirements under 112(f) and 112(b). 
MPEP 2181(II)(B) states that a rejection under 35 U.S.C. 112(b) is appropriate if the specification discloses no corresponding algorithm associated with a computer or microprocessor. Also MPEP 2181(II)(B) states, 
“the specification must explicitly disclose the algorithm for performing the claimed function, and simply reciting the claimed function in the specification will not be a sufficient disclosure for an algorithm which, by definition, must contain a sequence of steps. Blackboard, 574 F.3d at 1384, 91 USPQ2d at 1492 (stating that language that simply describes the function to be performed describes an outcome, not a means for achieving that outcome); Microsoft Computer Dictionary, Microsoft Press, 5th edition, 2002; see also Encyclopaedia Britannica, Inc. v. Alpine Elecs., Inc., 355 Fed. App'x 389, 394-95 (Fed. Cir. 2009) (holding that implicit or inherent disclosure of a class of algorithms for performing the claimed functions is not sufficient…”

Paragraph [0002] merely recites “Various types of machine learning system are known including neural networks, support vector machines, random decision forests and others…” This definition fails to provide an algorithm because it is implicitly or inherently disclosing a class of algorithms for performing the claimed function of “…to carry out, as the update, an online update comprising receiving the feedback and computing the second aggregated prediction as part of a machine learning operation to compute predictions from the unseen sensor data…” 
	
	Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:

(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claim 2 recites: 
“The sensor data process of claim 1 wherein the processor comprises a machine learning system configured to carry out, as the 
Claim 2 has several issues. 
1. “…configured to carry out, as the update….” Claim 1 contains two update steps and thus, when claim 2 recites “the update” it is unclear which update is “the update.” 
2. “…as part of a machine learning operation…” It is unclear what “a machine learning operation refers to”. From the claim language it appears that claim 2 is modifying at least one of the update steps of claim 1 and therefore when claim 2 recites “as part of a machine learning operation” it is unclear if the online update of claim 2 is separate or the same as any or all of the update steps in Claim 1. The examiner requests clarification. 
3. In the previous response the examiner noted the confusion between Claim 1, which appears, contextually, to be perform steps of a machine learning system, and the explicit call out of “a machine learning system” (again from the previous set of claims). In the instant set of claims, the confusion continues. That is, now claim 2 states that the processor of claim 1 “comprises a machine learning system” that apparently carries out ONLY one of the update steps of Claim 1. This functionality is confusing and the examiner requests clarification. 
That is, from the plain reading of Claim 2 it appears that, when considered in combination with Claim 1, there are two systems: 
	1. The “sensor data processor” which completes the steps of claim 1 including the update steps. 

Again, the examiner requests clarification. 

To put it a different way, and as the examiner interprets the claim under BRI in light of the specification, Claim 2 merely modifies one of the update steps of Claim 1 such that it is an online update. 
	Below is a suggested (e.g. not required) amendment to claim 2 which the examiner believes captures the applicant’s intended subject matter while remaining definite. However, because these amendments are only suggestive, a rejection under 112(b) for the reasons above still remains. 
	As understood by the examiner, Claim 2 reads as (e.g. BRI in light of the specification): 
	The sensor data process of Claim 1, wherein [the update of the weight associated with each trained expert model] is an online update. 
	OR 
	The sensor data processor of Claim 1, wherein [the update of the plurality of trained expert models, in the training stage] is an online update. 

	The above suggested amendments are the examiner’s BRI in light of the specification. 

Appropriate correction is required. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


For clarity of record and ease of reading, the examiner notes the following: 
Any text that is bolded is a limitation of a claim. 
The “teaching” or reference citation, along with any necessary examiner notes are contained within the parentheses “()” following the bolded claim language. 
Any text that is underlined is emphasized language from reference(s) used and/or particular important examiner notes. While NOT fully reflective of the rejection as a whole, these underlined passages are indicative or otherwise reflective of key evidence.   

Claim(s) 1-5, 7, 14-16, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatenable in view of Stiber et al. ("Site-Specific Updating and Aggregation of Bayesian Belief Network Models for Multiple Experts.") in view of Chyzhyk et al. (“An active learning approach for stroke lesion segmentation on multimodal MRI data.” NPL 2014)

With respect to Claim 1, Stiber teaches a sensor data processor comprising: a memory storing a plurality of trained expert models (Pg. 1531 Section 2.1 Col. 1 "For an expert system with J individual expert models, let Mj denote the expert model j, where j = 1, J.")
Stiber also teaches a processor configured to receive an unseen sensor data example and, for each trained expert model, compute a prediction from the unseen sensor data example using the trained expert model (Pg. 1531 Section 2.1 Col. 1 "To create an aggregate expert model, the predictions from the individual experts can be weighted to yield an aggregate prediction for the probability of the event described by the model." The examiner notes that a person of ordinary skill in the art would readily infer that if an individual expert makes a prediction then that individual expert MUST have received data. The examiner also notes Section 2.1 Col. 2 “These J expert models can be updated with the evidence x to produce the value of [event E given that expert model j is correct]…” The examiner notes that “evidence x” teaches “unseen sensor data example”.).
Stiber further teaches aggregate the predictions to form an aggregated prediction (Pg. 1531 Section 2.1 Col. 1 "To create an aggregate expert model, the predictions from the individual experts can be weighted to yield an aggregate prediction for the probability of the event described by the model." In addition or alternatively note Equation 3. This equation is defined as the probability-weighted aggregate prediction of the Event E, under the prior information state.  Note especially the P0(Mj) which is probability that each expert model is correct and that "If all J expert models are considered to be equally likely in the prior, then P0(Mj) = 1/J for all J expert models. In 
Stiber further teaches receive feedback relating to the aggregated prediction… (Pg. 1531 Section 2.1 Col. 2 "When evidence x is observed, this will determine the P(E|Mj) for each expert model and modify the probability that each expert model is correct, P(Mj)." The examiner notes that while “relating to” is believed to be definite, this language lends itself to broad interpretation. The examiner notes that calculating an updated probability that each model is correct when evidence x is observed teaches “receive feedback [e.g. the updated calculated probability] relating to the aggregated prediction”.). 
Stiber further teaches store the received feedback (Pg. 1532 Col. 1 “As such, to determine the likelihood function for each expert, the corresponding BBN model must be evaluated K times, beginning with an evaluation of the predicted probability of the first observed event under the prior model, followed by sequential evaluations for each additional observed event, using the model updated with the knowledge of the occurrence of all previously considered events.” The examiner notes that if the model is updated based on the knowledge of “all previous event” then, logically, the knowledge of these previous events MUST be stored or otherwise known to the system. Thus, Stiber teaches the claim language as required.). 
Stiber further teaches update, for each trained expert model, a weight associated with the trained expert model, using the received feedback (Pg. 1531 Section 2.1 Col. 1 "To create an aggregate expert model, the predictions from the individual experts can be weighted to yield an aggregate prediction for the probability of The examiner notes that the relative probability based on the evidence that has thus been observed teaches  "update, for each trained expert model a weight associated with the trained expert model, using the received feedback".). 
Stiber further teaches compute a second aggregated prediction by computing an aggregation of the predictions which takes into account the weights (Pg. 1531 Note Equation 1 which is described as "the probability weighted aggregate prediction for the occurrence of the event of interest E. The examiner notes that P(Mj) is the "probability that expert model j is correct. Each P(Mj) is a weighting factor that is applied to its respective P(E|Mj)." The examiner further notes that because the probability “P(Mj)” is calculated and updated when evidence x is observed, the updated probability that is calculated and then the resultant updating of the aggregated prediction because of the new evidence x teaches “compute a second aggregated prediction by computing an aggregation of the predictions which takes into account the weights”.). 
Stiber, further teaches update, in a training stage, the plurality of trained expert models, using the stored received feedback (Pg. 1532 Col. 1 “As such, to determine the likelihood function for each expert, the corresponding BBN model must be evaluated K times, beginning with an evaluation of the predicted probability of the first observed event under the prior model, followed by sequential evaluations for each additional observed event, using the model updated with the knowledge of the occurrence of all previously considered events.” The examiner notes that if the model is 
Stiber, however, does not appear to explicitly disclose: 
…wherein the feedback comprises a plurality of truth labels for a subset of the unseen data example corresponding to failed tests
Chyzhyk, however, does teach …wherein the feedback comprises a plurality of truth labels for a subset of the unseen data example corresponding to failed tests (Chyzhyk Section 3.2 Active learning. Note algorithm 1 which shows “Active learning general algorithm”. Note the algorithm inputs especially “Unlabeled test data Uk…” The examiner notes that this set of Unlabeled test data teaches “unseen data example”. Next, note line 3 which shows that test samples (e.g. unseen data) are evaluated using active learning. That is, each test sample (e.g. unseen data) is classified according to the “current training set” (e.g. see line 2) or otherwise compared to the current training set. As a result of this (e.g. see line 4) each test sample has its respective uncertainty evaluated. From the paragraph below Algorithm 1, this uncertainty evaluation is described: “The estimation of each unlabeled sample uncertainty follows a committee approach…because we will be using [Random Forest] RF as the classifier model. Assume that we have built a committee classifiers, i.e. a RF with T trees, so that classification is provided by the majority voting. The committee provides T labels for each candidate sample…so that the uncertainty of its classification most uncertainty pixels (e.g. samples) are shown to the oracle (e.g. human with expert knowledge) and a label is assigned to those selected uncertain pixels. The examiner notes that an oracle (e.g. human with expert knowledge) interacting with the samples that the algorithm has deemed the most uncertain and the oracle providing a class for those most uncertain samples teaches “wherein the feedback [e.g. human providing a label] comprises a plurality of truth labels [e.g. the label assigned to each of the most uncertain pixels] for a subset [e.g. the most uncertain pixels] for the unseen data example [e.g. unlabeled test samples] corresponding to failed tests [e.g. is has not been successfully classified and/or the classifier is not confident in its predictions].”). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the multiple expert Bayesian network as taught by Stiber modified with the active learning and oracle feedback as taught by Chyzyk because using active learning on the most uncertain samples increases the convergence rate at which the classifier matches the human expert. Thus, because of this fewer iterations are required decreasing the training time (Chyzyk Pg. 33 Col 2). 
With respect to Claim 2, the combination of Stiber and Chyzyk teaches wherein the processor comprises a machine learning system configured to carry out, as the update, an online update comprising receiving the feedback and computing the second aggregated prediction as part of a machine learning operation to compute predictions from the unseen data (As an initial note, the The examiner notes that because the update happens "when evidence x is observed" then, as a person of ordinary skill in the art would readily infer, that update must be "part of the operation of the machine learning system" and that it, the system, must be an online.). 
With respect to Claim 3, the combination of Stiber and Chyzyk teaches wherein the processor is configured to set initial values of the weights to the same value (Stiber Pg. 1531 note Equation 3. This equation is defined as the probability-weighted aggregate prediction of the Event E, under the prior information state.  Note especially the P0(Mj) which is probability that each expert model is correct and that "If all J expert models are considered to be equally likely in the prior, then P0(Mj) = 1/J for all J expert models. In this case, the aggregate prediction for an event is the simple average of the predicted probabilities of the J expert models." The examiner notes that a person of ordinary skill in the art would readily infer that if "all j expert models are considered to be equally likely..." then the initial weights must be the same value.)
With respect to Claim 4, the combination of Stiber and Chyzyk teaches wherein the processor is configured to represent aggregation of the trained expert models using a probabilistic model and to update the weights using the probabilistic model and based at least in part on the feedback (Pg. 1531 Section 2.1 Equation 1. which is described as "the probability weighted aggregate prediction for the occurrence of the event of interest E. The examiner notes that P(Mj) is the Additionally, note that the system of Stiber, as a whole, is at least a Bayesian Belief Network, which, necessarily and by definition, is a probabilistic model.). 
With respect to Claim 5, the combination of Stiber and Chyzyk teaches wherein the processor is configured to compute each weight as a prior probability of the prediction being from one of the trained expert models times the likelihood of the feedback (Stiber Pg. 1531 Col. 2 "The prior probability P0j(A) and the conditional probabilities can be obtained directly from each of the J individual BBN expert models. Further note at least Equation 4 note that in numerator of the equation the probability is multiplied by the prior (P0) probability. The examiner notes that the weight being updated based on a multiplication of the prior probability (P0) and the probability that model j is correct given evidence x teaches “wherein the processor is configured to compute each weight as a prior probability of the prediction being from one of the trained expert models times the likelihood of the feedback”.). 
With respect to Claim 7, the combination of Stiber and Chyzyk teaches wherein each of the predictions comprises a plurality of elements of the unseen sensor data corresponding thereto, wherein computing the second aggregated prediction comprises computing an aggregation of initial elements of the plurality of elements using the updated weights, and wherein the initial elements are selected using the feedback and the initial element includes less than all of the elements of the predictions (Pg. 1531 Col. 2 "With a BBN-based expert system, this likelihood function is readily calculated. For example, if the evidence x consists of 
With respect to Claim 14, Stiber teaches all the limitations of Claim 1 as discussed above. 
Stiber however, does not explicitly disclose wherein the unseen sensor data example is a medical image comprising a medical image volume and wherein the feedback about the aggregated prediction is related to a slice of the medical image volume and wherein the second aggregated prediction is a medical image volume.
Chyzhyk, however, does teach wherein the unseen sensor data example is a medical image representing a medical image volume and wherein the feedback about the aggregated prediction corresponds to a slice of the medical image volume (Chyzhyk Pg. 29 Figure 1 shows various volume rendered images of a brain. This reads on the claimed “medical image representing a medical image volume”. The examiner further notes that each image presented is described as a slice. For example, Pg. 29 Col. 1 Section 4 recites “Fig. 1 illustrates the variety of the imaging data by 

It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the multiple experts and aggregated prediction as taught by Stiber modified with the medical image and medical feedback as taught by Chyzhyk because this would lead to faster diagnosis times for stroke lesions (Chyzhyk Pg. 1 and 28). 

With respect to Claim 15, Stiber teaches a computer-implemented method of online update of a trained machine learning system comprising a plurality of trained expert models, the method comprising (Pg. 1531 Section 2.1 Col. 1 "For an expert system with J individual expert models, let Mj denote the expert model j, where j = 1, J.")
Stiber also teaches receiving, at a processor, an unseen sensor data example; for each trained expert model, computing a prediction from the unseen sensor data example using the trained expert model (Pg. 1531 Section 2.1 Col. 1 "To create an aggregate expert model, the predictions from the individual experts can be weighted to yield an aggregate prediction for the probability of the event described by the model." The examiner notes that a person of ordinary skill in the art would readily infer that if an individual expert makes a prediction then that individual expert MUST have received data.).
Stiber further teaches aggregating the predictions to form an aggregated prediction (Pg. 1531 Section 2.1 Col. 1 "To create an aggregate expert model, the predictions from the individual experts can be weighted to yield an aggregate prediction for the probability of the event described by the model." In addition or alternatively note Equation 3. This equation is defined as the probability-weighted aggregate prediction of the Event E, under the prior information state.  Note especially the P0(Mj) which is probability that each expert model is correct and that "If all J expert models are considered to be equally likely in the prior, then P0(Mj) = 1/J for all J expert models. In this case, the aggregate prediction for an event is the simple average of the predicted probabilities of the J expert models."). 
Stiber further teaches receiving feedback relating to the aggregated prediction (Pg. 1531 Section 2.1 Col. 2 "When evidence x is observed, this will 
Stiber further teaches storing the received feedback (Pg. 1532 Col. 1 “As such, to determine the likelihood function for each expert, the corresponding BBN model must be evaluated K times, beginning with an evaluation of the predicted probability of the first observed event under the prior model, followed by sequential evaluations for each additional observed event, using the model updated with the knowledge of the occurrence of all previously considered events.” The examiner notes that if the model is updated based on the knowledge of “all previous event” then, logically, the knowledge of these previous events MUST be stored or otherwise known to the system. Thus, Stiber teaches the claim language as required.). 
Stiber further teaches updating, for each trained expert model, a weight associated with the trained expert model, using the received feedback (Pg. 1531 Section 2.1 Col. 1 "To create an aggregate expert model, the predictions from the individual experts can be weighted to yield an aggregate prediction for the probability of the event described by the model." This weight should be equal to the relative probability that each individual expert model is correct given the evidence that has thus far been observed." The examiner notes that the relative probability based on the evidence that has thus been observed reads on the claimed "update...a weight"). 
Stiber further teaches computing a second aggregated prediction by computing an aggregation of the predictions which takes into account the weights (Pg. 1531 Note Equation 1 which is described as "the probability weighted 
Stiber, further teaches updating, in a training stage, the plurality of trained expert models, using the stored received feedback (Pg. 1532 Col. 1 “As such, to determine the likelihood function for each expert, the corresponding BBN model must be evaluated K times, beginning with an evaluation of the predicted probability of the first observed event under the prior model, followed by sequential evaluations for each additional observed event, using the model updated with the knowledge of the occurrence of all previously considered events.” The examiner notes that if the model is updated based on the knowledge of “all previous event” then, logically, the knowledge of these previous events MUST be stored or otherwise known to the system. Thus, Stiber teaches the claim language as required.). 
Stiber, however, does not appear to explicitly disclose: 
…wherein the feedback comprises a plurality of truth labels for a subset of the unseen data example corresponding to failed tests

Chyzhyk, however, does teach …wherein the feedback comprises a plurality of truth labels for a subset of the unseen data example corresponding to failed tests (Chyzhyk Section 3.2 Active learning. Note algorithm 1 which shows “Active learning general algorithm”. Note the algorithm inputs especially “Unlabeled test data Uk…” The examiner notes that this set of Unlabeled test data teaches “unseen data example”. Next, note line 3 which shows that test samples (e.g. unseen data) are most uncertainty pixels (e.g. samples) are shown to the oracle (e.g. human with expert knowledge) and a label is assigned to those selected uncertain pixels. The examiner notes that an oracle (e.g. human with expert knowledge) interacting with the samples that the algorithm has deemed the most uncertain and the oracle providing a class for those most uncertain samples teaches “wherein the feedback [e.g. human providing a label] comprises a plurality of truth labels [e.g. the label assigned to each of the most uncertain pixels] for a subset [e.g. the most uncertain pixels] for the unseen data example [e.g. unlabeled test samples] corresponding to failed tests [e.g. is has not been successfully classified and/or the classifier is not confident in its predictions].”). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the multiple expert Bayesian network as taught by Stiber modified with the active learning and oracle feedback as 
With respect to Claim 16, the combination of Stiber and Chyzhyk teaches representing aggregation of the trained expert models using a probabilistic model and using the probabilistic model to update the weights and based at least in part on the feedback (Pg. 1531 Section 2.1 Equation 1. which is described as "the probability weighted aggregate prediction for the occurrence of the event of interest E. The examiner notes that P(Mj) is the "probability that expert model j is correct. Each P(Mj) is a weighting factor that is applied to its respective P(E|Mj)." Additionally, note that the system of Stiber, as a whole, is at least a Bayesian Belief Network, which, necessarily and by definition, is a probabilistic model.). 
With respect to Claim 18, the combination of Stiber and Chyzhyk teaches wherein each of the predictions comprises a plurality of elements of the unseen sensor data corresponding thereto, wherein computing the second aggregated prediction comprises computing an aggregation of initial elements of the plurality of elements using the updated weights, and wherein the initial elements are selected using the feedback and the initial element include less than all of the elements of the predictions (Pg. 1531 Col. 2 "With a BBN-based expert system, this likelihood function is readily calculated. For example, if the evidence x consists of findings A, B, and C, the likelihood function is by [Equation 5] where P0j(A) is the prior probability of event A predicted by model j before any evidence has been collected, 
With respect to Claim 19, the combination of Stiber and Chyzhyk teach wherein the unseen sensor data example is a medical image representing a medical image volume and wherein the feedback about the aggregated prediction corresponds to a slice of the medical image volume (Chyzhyk Pg. 29 Figure 1 shows various volume rendered images of a brain. This reads on the claimed “medical image representing a medical image volume”. The examiner further notes that each image presented is described as a slice. For example, Pg. 29 Col. 1 Section 4 recites “Fig. 1 illustrates the variety of the imaging data by showing an axial slice of each of them.” Next, note Pg. 30 Figure. 3 which is described as a “Manual delineation of the lesion by the neuropsychologist….” The action of manual delineation reads teaches the claimed “feedback about the aggregated prediction corresponds to a slice of the medical image volume”. Next, see Pg. 28 Col. 2 Section 3.2 “At iteration k, the active learning algorithm selects from Uk the q candidates with maximal uncertainty in their class prediction with the current classifier trained on the current training set Xk. The select samples Sk…are labeled with labels…by an oracle, often a human operator in 

With respect to Claim 20, Stiber teaches an image processing system comprising: a memory storing a plurality of trained expert models (Pg. 1531 Section 2.1 Col. 1 "For an expert system with J individual expert models, let Mj denote the expert model j, where j = 1, J.")
Stiber also teaches a processor configured to…for each trained expert model, compute a prediction…using the trained expert model (Pg. 1531 Section 2.1 Col. 1 "To create an aggregate expert model, the predictions from the individual experts can be weighted to yield an aggregate prediction for the probability of the event described by the model." The examiner notes that a person of ordinary skill in the art would readily infer that if an individual expert makes a prediction then that individual expert MUST have received data.).
Stiber further teaches aggregate the predictions to form an aggregated prediction (Pg. 1531 Section 2.1 Col. 1 "To create an aggregate expert model, the predictions from the individual experts can be weighted to yield an aggregate prediction for the probability of the event described by the model." In addition or alternatively note Equation 3. This equation is defined as the probability-weighted aggregate prediction of 
Stiber further teaches receive feedback relating to the aggregated prediction (Pg. 1531 Section 2.1 Col. 2 "When evidence x is observed, this will determine the P(E|Mj) for each expert model and modify the probability that each expert model is correct, P(Mj)."). 
Stiber further teaches store the received feedback (Pg. 1532 Col. 1 “As such, to determine the likelihood function for each expert, the corresponding BBN model must be evaluated K times, beginning with an evaluation of the predicted probability of the first observed event under the prior model, followed by sequential evaluations for each additional observed event, using the model updated with the knowledge of the occurrence of all previously considered events.” The examiner notes that if the model is updated based on the knowledge of “all previous event” then, logically, the knowledge of these previous events MUST be stored or otherwise known to the system. Thus, Stiber teaches the claim language as required.). 
Stiber further teaches update, for each trained expert model, a weight associated with the trained expert model, using the received feedback (Pg. 1531 Section 2.1 Col. 1 "To create an aggregate expert model, the predictions from the individual experts can be weighted to yield an aggregate prediction for the probability of the event described by the model." This weight should be equal to the relative 
Stiber further teaches compute a second aggregated prediction by computing an aggregation of the predictions using the updated weights (Pg. 1531 Note Equation 1 which is described as "the probability weighted aggregate prediction for the occurrence of the event of interest E. The examiner notes that P(Mj) is the "probability that expert model j is correct. Each P(Mj) is a weighting factor that is applied to its respective P(E|Mj)." The examiner further refers to the response to arguments above.). 
Stiber, further teaches update, in a training stage, the plurality of trained expert models, using the stored received feedback (Pg. 1532 Col. 1 “As such, to determine the likelihood function for each expert, the corresponding BBN model must be evaluated K times, beginning with an evaluation of the predicted probability of the first observed event under the prior model, followed by sequential evaluations for each additional observed event, using the model updated with the knowledge of the occurrence of all previously considered events.” The examiner notes that if the model is updated based on the knowledge of “all previous event” then, logically, the knowledge of these previous events MUST be stored or otherwise known to the system. Thus, Stiber teaches the claim language as required. The examiner further notes the response to arguments as discussed above.). 
Stiber, however, does not explicitly disclose
receive an image. 
…compute a prediction from the image…
Wherein the feedback comprises a plurality of truth labels for a subset of the unseen data example corresponding to failed tests

Chyzhyk, however, does teach receive an image (Chyzhyk Pg. 29 Figure 1 shows various volume rendered images of a brain. This reads on the claimed “image.”). 
	Chyzhyk also teaches compute a prediction from the image (Pg. 28 Col. 2 note Algorithm 1 and Step 4 “Evaluate uncertainty” Further Col. 2 recites “Assume that we have built a committee classifiers, i.e. a RF with T trees, so that classification is provided by the majority voting. The committee provides T labels for each candidate sample…so that the uncertainty of its classification may be measured by the standard deviation…of the distribution of the class predictions provided by the individual decision trees.” Predicting a class label for a particular sample (e.g. image) for each of the individual decision trees reads on the claim language.). 
Chyzhyk, however, does teach …wherein the feedback comprises a plurality of truth labels for a subset of the unseen data example corresponding to failed tests (Chyzhyk Section 3.2 Active learning. Note algorithm 1 which shows “Active learning general algorithm”. Note the algorithm inputs especially “Unlabeled test data Uk…” The examiner notes that this set of Unlabeled test data teaches “unseen data example”. Next, note line 3 which shows that test samples (e.g. unseen data) are evaluated using active learning. That is, each test sample (e.g. unseen data) is classified according to the “current training set” (e.g. see line 2) or otherwise compared to the current training set. As a result of this (e.g. see line 4) each test sample has its most uncertainty pixels (e.g. samples) are shown to the oracle (e.g. human with expert knowledge) and a label is assigned to those selected uncertain pixels. The examiner notes that an oracle (e.g. human with expert knowledge) interacting with the samples that the algorithm has deemed the most uncertain and the oracle providing a class for those most uncertain samples teaches “wherein the feedback [e.g. human providing a label] comprises a plurality of truth labels [e.g. the label assigned to each of the most uncertain pixels] for a subset [e.g. the most uncertain pixels] for the unseen data example [e.g. unlabeled test samples] corresponding to failed tests [e.g. is has not been successfully classified and/or the classifier is not confident in its predictions].”). 
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the expert model and aggregated prediction as taught by Stiber modified with the input as an image as taught by Chyzhyk because using lesion tissue samples and/or Brain MRI images as input would lead to faster diagnosis times and more accurate prognosis (Chyzhyk Pg. 1 Col. 1). 
Claims 6, 10, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stiber et al. ("Site-Specific Updating and Aggregation of Bayesian Belief Network Models for Multiple Experts.") in view of Chyzhyk et al. (“An active learning approach for stroke lesion segmentation on multimodal MRI data.” NPL 2014) in view of Shivaswamy et al. ("Coactive Learning", NPL 2015).
With respect to Claim 6, the combination of Stiber and Chyzhyk teaches all of the limitations of Claim 1 as described above. 
the combination of Stiber and Chyzhyk further teaches wherein the processor is configured such that the update in the training stage comprises multiplying the weight associated with the trained expert model with a likelihood of the feedback…(Stiber Pg. 1531 Section 2.1 Equation 1. which is described as "the probability weighted aggregate prediction for the occurrence of the event of interest E. The examiner notes that P(Mj) is the "probability that expert model j is correct. Each P(Mj) is a weighting factor that is applied to its respective P(E|Mj)." Additionally, note that the system of Stiber, as a whole, is at least a Bayesian Belief Network, which, necessarily and by definition, is a probabilistic model.). 
the combination of Stiber and Chyzhyk, however, does not explicitly disclose:
…and then normalizing the weight. 
Shivaswamy, however, does disclose…and then normalizing the weight that has been multiplied with the likelihood of the feedback to a numerical value between zero and one (Pg. 12 Section 5.1 "After each multiplicative update, the weight are normalized to sum to one, and the steps of the algorithm repeat." The examiner notes that normalizing “After” reads on the claimed “…and then…”. The examiner further 
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the weight update using multiplication as taught by the combination of Stiber and Chyzhyk modified with the normalizing as taught by Shivaswamy because this would allow for more control and more efficient computation of the learning rate after each update (Shivaswamy Pg. 12).
With respect to Claim 10, the combination of Stiber, Chyzhyk, and Shivaswamy teach wherein the processor is configured to receive the feedback in the form of user input relating to individual elements of the aggregated prediction (Shivaswamy Pg. 4 Section 3 Coactive Learning Model. “We now introduce coactive learning as a model of interaction (in rounds) between a learning system (e.g. search engine) and a human…were both the human and learning algorithm have the same goal (of obtaining good results). At each round t, the learning algorithm observes a context …(e.g. a search query) and presents a structured object…(e.g. a ranked list of URLs). The utility of [the structured object] to the user for context xt…is described by a utility function…As feedback the human user returns an improved object…(e.g. reordered list of URLs)…” The examiner notes that by reordering the list of URLs presented by the system, the user’s feedback “relates” the individual element (e.g. each individual URL) to the system’s overall (e.g. aggregated) prediction of ranked URLs.). 
	With respect to Claim 17, the combination of Stiber, Chyzhyk, and Shivaswamy teach updating the weights by multiplying the weight associated with the trained expert model with a likelihood of the feedback…(Stiber Pg. 1531 Section 2.1 Equation 1. which is described as "the probability weighted aggregate prediction for the occurrence of the event of interest E. The examiner notes that P(Mj) is the "probability that expert model j is correct. Each P(Mj) is a weighting factor that is applied to its respective P(E|Mj)." Additionally, note that the system of Stiber, as a whole, is at least a Bayesian Belief Network, which, necessarily and by definition, is a probabilistic model.)…and then normalizing the weight that has been multiplied with the likelihood of the feedback to have a numerical value between one and zero (Shivaswamy. The examiner initially notes the rejection under 112(b) above. Pg. 12 Section 5.1 "After each multiplicative update, the weight are normalized to sum to one, and the steps of the algorithm repeat." The examiner notes that normalizing “After” reads on the claimed “…and then…”). 

Claims 8-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stiber et al. ("Site-Specific Updating and Aggregation of Bayesian Belief Network Models for Multiple Experts.") in view of Chyzhyk et al. (“An active learning approach for stroke lesion segmentation on multimodal MRI data.” NPL 2014) in view of Corso et al. (“Efficient Multilevel Brain Tumor Segmentation With Integrated Bayesian Model Classification”, NPL 2008). 

With respect to Claim 8, the combination of Stiber and Chyzhyk teaches all of the limitations of Claim 1 and Claim 7 as described above. 

Corso, however, does teach increasing the number of elements of the predictions which are aggregated by adding elements which are adjacent to the initial elements to increase a size of a feedback region (The examiner initially notes that this claim appears to be directed towards a known method of segmentation known as “region growing” (See at least Paragraph [0050] of as-filed specification). The examiner notes that, by definition of “region growing” the “number of elements” (e.g. pixels, voxels, etc.) added, necessarily are adjacent to the tested element. With this understanding, see Corso Fig.3. Also Section IV “Segmentation by Weighted Aggregation.” (Pgs 633-634) “The finest layer in the graph…is induced by the voxel lattice: each voxel I becomes a node…with six-neighbor connectivity, and node properties set according to the image…SWA proceeds by iteratively coarsening the graph according to the following algorithm…” Note the progression of Figure 3 with the above in mind. A person of ordinary skill in the art would readily infer that the number of elements in a given region are increased based on the “weighted aggregation”. As can be seen (Fig. 3), a feedback region is increased thus, Corso teaches the claim language. In the alternative, or in addition, Note Section IV (B) which discusses the use of Bayesian weighting.). 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the expert model system as 
With respect to Claim 9, the combination of Stiber, Chyzhyk, and Corso teach iteratively increasing the number of elements in the feedback region until there is no change between a current aggregated prediction and a previous aggregated prediction (Corso Pg. 636 “To optimize the coefficients for each class-pair, we perform an initial stochastic search for the best parameters followed by a steepest coordinate-descent procedure. The gradient of the function is estimated numerically at each iteration and the single coordinate that optimally modifies the affinities is adjusted. The procedure is terminated when no adjustment will improve the affinity over the training data.” The examiner notes that the referenced affinity is at least part of the weighted aggregation (See above). Further Pg. 632 which describes the “Bayesian model aware affinity” recites “This…avoids making premature hard assignments of nodes to models by integrating over all possible models and weighting by the class evidence and prior…” From the above, a person of ordinary skill in the art would readily infer that 1) the number of elements are increased (see for example Corso Figure 3) and 2) the increasing is stopped when there is no change between the current aggregated prediction (the latest output from Corso weighted aggregation method) and previous aggregation prediction (Pg. 632 explicitly discloses that the “prior” is used, at least in part, to calculate the affinity). Thus, Corso teaches the claim language as required.). 

Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stiber et al. ("Site-Specific Updating and Aggregation of Bayesian Belief Network Models for Multiple Experts.") in view of Chyzhyk et al. (“An active learning approach for stroke lesion segmentation on multimodal MRI data.” NPL 2014) in view of Hametner et al. (“Local model network identification for online engine modelling”, NPL 2013).
With respect to Claim 11, the combination of Stiber and Chyzhyk teaches all of the limitations of Claim 10 as described above. 
The combination of Stiber and Chyzhyk, however, does not appear to explicitly disclose wherein the training stage is an offline training stage. 
Hametner, however, does teach wherein the training stage is an offline training stage (Pg. 214 Section 3 Evolving model tree. “In this section the enhancement of the offline (batch) training algorithm…is presented, in order to allow an online training…of the local model network...” Section 3.1 “In the context of local model networks, the use of an incremental tree is an effective model building strategy…the incremental model construction allows to gradually increase the complexity of the local model network: When the number of local models M is incremented by one, the worst local model (indexed by l) of the logistic discriminant tree (Fig. 2) is replaced by a new node and two adjoining local models are appended, see Fig. 5. On the one hand this strategy allows a proper initialization of the new model parameters while on the other had the computational demand is low.” Section 3.5 “ See Also Fig. 1 Note that that the models are updated (Denoted by the line from y(k) to the box of “prior information” (q-1). The examiner notes that iterative updating the models based on the new evidence (See 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the expert model system as taught by the combination of Stiber and Chyzhyk modified with the offline training and online training as taught by Hametner because this would reduce the computational resources required to perform such a process thereby reducing the time and cost of the system (Hametner Pg. 214). 
Claims 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stiber et al. ("Site-Specific Updating and Aggregation of Bayesian Belief Network Models for Multiple Experts.") in view of Chyzhyk et al. (“An active learning approach for stroke lesion segmentation on multimodal MRI data.”) and further in view of Rota-Bulo (“Online Learning with Bayesian Classification Trees”, NPL 2016, hereinafter “Bulo”)
With respect to Claim 12, the combination of Stiber and Chyzyk teach wherein the processor is configured to receive the feedback from a computer-implemented process and computing the second aggregated prediction comprises performing a Bayesian re-weighting of each of the individual expert models of the plurality of expert models…( Stiber Pg. 1534 Figure 2 shows the “Structure of Bayesian Belief Network” note that event box represents different pieces of evidence (e.g. temperature, Hydrogen(H2), etc.). Therefore, a person of ordinary skill in the art would readily infer that the evidence (e.g. feedback) must be from any or all of the shown sources and this feedback must have been collected and “input” into the 
The combination of Stiber and Chyzyk, however, do not appear to explicitly disclose: 
…and wherein the feedback comprises a plurality of rounds of Bayesian refinement of at least one decision forest, and in each round of the plurality of rounds, a posterior weight is updated
Bulo, however, teaches…and wherein the feedback comprises a plurality of rounds of Bayesian refinement of at least one decision forest, and in each round of the plurality of rounds, a posterior weight is updated (Bulo, first note Title, Section 2 (“classification trees”, and at least Figure 1. Figure 1 caption recites: “Example of the Bayesian online learning process: we start with a prior distribution…over the set of decision trees.” The examiner notes that “a set of decision trees” (e.g. more than one decision tree) necessarily teaches “at least one decision The examiner notes that using Bayes rule in an online update step where a posterior distribution is updated based on a new data sample, and this update rule is for x number of update steps, teaches “…and wherein the feedback comprises a plurality of rounds of Bayesian refinement of at least one decision forest, and in each round of the plurality of rounds, a posterior weight is updated”. The examiner further refers to Algorithm 1 on Pg. 3990 Col. 1).
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the Bayesian expert models and active learning as taught by the combination of Stiber and Chyzyk modified with the Online Bayesian refinement and decision forests as taught by Bulo because online learning does not require the resources of conventional methods, thus the resource cost is less (Bulo Pg. 3985 Col. 1). 

	With respect to Claim 13, the combination of Stiber, Chyzyk, and Bulo teach 
wherein the at least one decision forest comprises at least one tree…(Bulo see at least Figure 1 note “set of decision trees”. The examiner notes that a set of decision trees teaches “wherein the at least one decision forest comprises at least one tree”.)

…and in response to a value for a calculated criteria being greater than or equal to a threshold, and a depth of the at least one tree that is less than a maximum value, a current leaf node of the at least one tree is set as a split node having child nodes that are trained…(First, with respect to “a value for a calculated criteria being greater than or equation to a threshold”. 
Bulo Section 2 discusses Classification trees. In particular “A classification tree is a classifier of decision nodes and prediction nodes, arranged in a tree-structure. Decision nodes correspond to the tree’s internal nodes N and are responsible four routing data samples to an appropriate prediction node (i.e. leaf) in L…” Each decision node                         
                            n
                             
                            ∈
                            Ν
                        
                     takes a routing decision for a data sample x…via a routing function bn: X [Wingdings font/0xE0] {0,1}. If bn(x) = 1 then x is routed to the left sub-tree, otherwise is goes to the right one. Akin to conventional decision trees we consider binary decision functions…” The examiner further notes Equation 1. Especially note the greater than or equal to inequality. 
That is, when some node n on a classification tree is presented with a data sample x, that node performs a binary decision function. As can be seen from Bulo equation 1, this binary decision function a comparison of whether or not the truth value (e.g. Label value) is greater than or equal to 0. This binary decision function with a comparison of “greater than or equal to” teaches “…and in response to a value [e.g. indicator function] for a calculated criteria [e.g. truth value P] being greater than or equal to a threshold…” 
Second, with respect to a “depth”. Bulo Pg. 3989 Col. 2 “We select a prior distribution from the same family Q as the surrogate posterior given in [equation 8]…We instantiate a prior distribution by providing a tree structure S0 with some pre-defined depth…” Further note Pg. 3992 Col. 1 “We trained again ensembles comprising 8 balanced Bayesian trees, using maximum depths of 7 or 8…” Additionally, or in the alternative, Pg. 3991 Col. 1 “…As a rule of thumb, we define a dataset specific set of possible tree depths from where the actual tree depth is randomly selected. Specifically, we sample the tree depth from {⌈log2 (|Y|)⌉, . . . , ⌈log2 (|Y|)⌉+2} such that there are at least as many leaves as number of classes. E.g., the satimages dataset has 6 classes which means that we randomly select a tree depth between 3 and 5.” The examiner notes that creating a depth for a tree such that there as many leaves as number of classes teaches “…and a depth of the at least one tree that is less than a maximum value…” wherein the maximum value is the number of classes as disclosed by Bulo above. 
	Third, with respect to “a current leaf node of the at least one tree is set as a split node having child nodes that are trained.” The examine notes Bulo Section 2 discusses Classification trees. In particular “A classification tree is a classifier of decision nodes and prediction nodes, arranged in a tree-structure. Decision nodes correspond to the tree’s internal nodes N and are responsible for routing data samples to an appropriate prediction node (i.e. leaf) in L…” Note equation 1 and Algorithm 1 (see Pg. 3990 Col. 1). Note that because the depth of the tree (see Algorithm 1 “Require:…tree structure”) is considered in the training (e.g. online learning) of the tree and because the “split test” is considered on the current node, Bulo teaches “a current leaf node of the at least one tree is set as a split node having child nodes that are trained.”
Finally, the examiner notes that, in conclusion, a data sample reaching a decision node (e.g. a current leaf node) and a binary decision function being made based on “a greater than or equal to” function and deciding that that sample is to go to a sub-tree based on the binary decision function result teaches “…and in response to a value for a calculated criteria being greater than or equal to a threshold, and a depth of the at least one tree that is less than a maximum value, a current leaf node of the at least one tree is set as a split node having child nodes that are trained…”)

…and wherein each child node is trained using the subset of unseen sensor data elements at the current leaf node the subset of the unseen sensor data elements sent to the child node determined using parameters to optimize the calculated criteria and used in a binary test…(Bulo Section 2 Decision nodes correspond to the tree’s internal nodes N and are responsible four routing data samples to an appropriate prediction node (i.e. leaf) in L…” Each decision node                         
                            n
                             
                            ∈
                            Ν
                        
                     takes a routing decision for a data sample x…via a routing function bn: X [Wingdings font/0xE0] {0,1}. If bn(x) = 1 then x is routed to the left sub-tree, otherwise is goes to the right one. Akin to conventional decision trees we consider binary decision functions…” The examiner further notes Equation 1. Especially note the greater than or equal to inequality. The examiner notes that this greater than or equal to inequality, equation 14 (See Pg. 3986 Col. 1), and/or Lines 5-6 each individual teach “…and used in a binary test”. Further note equation 14 on Pg. 3988 Col. 2. Note the “right down arrow” and “left down arrow”. This notation is explicitly defined as a binary routing function (see Pg. 3986 Col. 2). 
The examiner notes that the next training sample teaches “…the subset of unseen sensor data elements…” 
Further note Pg. 3991 Col. 1 “As a rule of thumb, we define a dataset specific set of possible tree depths from where the actual tree depth is randomly selected. Specifically, we sample the tree depth from {⌈log2 (|Y|)⌉, . . . , ⌈log2 (|Y|)⌉+2} such that there are at least as many leaves as number of classes. E.g., the satimages dataset has 6 classes which means that we randomly select a tree depth between 3 and 5.” 
The examiner notes that because the satellite images dataset was used, the “data samples” used to train the tree MUST have come from the satellite images dataset. Therefore, an input data sample from the satellite images dataset teaches “unseen [e.g. input into the system and used for learning] sensor data elements [e.g. images].”
Next, note Pg. 3986 Col. 1 “Indeed, each prediction node l holds a probability distribution…over labels in Y that will be used to deliver the final prediction for the data sample reaching it…”)
…and data elements of the unseen sensor data elements that pass the binary test form a first subset sent to a first child node (Bulo Pg. 3986 Col. 2 See Eq. 1. Note the description above the equation: “If bn(x) = 1, then x is routed to the left sub-tree…” The examiner notes that a data sample being sent to the left sub-tree upon being greater than or equal to the threshold teaches “and data elements of the unseen sensor data elements that pass the binary test form a first subset sent to a first child 
… and unseen sensor data elements that fail the binary test form a second subset sent to a second child node (Bulo Pg. 3986 Col. 2 See Eq. 1. Note the description above the equation: “If bn(x) = 1, then x is routed to the left sub-tree, otherwise it goes to the right one”. The examiner notes that failing to meet the threshold and subsequent sending of the data sample to the right sub-tree teaches “and unseen sensor data element that fail the binary test form a second subset sent to a second child node…” In addition or in the alternative, note equation 14 and Algorithm 1 line 6 which show similar functionality)
…with each of the first and second child nodes being trained (Bulo Algorithm 1 Note the title of the algorithm “Online learning of Bayesian classification tree”. As can be seen because algorithm 1 is recursively applied and requires at least the “latest surrogate posterior parameter” (e.g. the surrogate posterior parameter that was updated in the last iteration), Algorithm 1 and especially Lines 5 and 6 which apply the latest updates, showing a training process. Therefore, the use of algorithm 1 in determining whether a data sample is sent to a right or left child node and subsequent update of the parameters based on that determination teach “…with each of the first and second child nodes being trained.”). 

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

2. Qiang, Zhu “Research and Application of a method for constructing decision forests”, NPL 2007. Similar inventive concept. Note especially the discussion how Bayesian method pg. 4. Note equation 4.2.2. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FEN TAMULONIS whose telephone number is (571)272-0934.  The examiner can normally be reached on 7:30AM-5:30PM MON-FRI EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571)-272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-





/F.C.T./Examiner, Art Unit 2126
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126