Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/18/2022 has been entered.

Status of Claims
This action is in reply to the amendments and remarks filed on 02/18/2022.
Claims 1, 5, 7-19, and 21-30 are pending.
Claims 1, 14, and 28-30 have been amended.
Claims 2-4, 6, and 20 have been canceled.

Response to Arguments
Applicant’s arguments, with respect to select rejections of claim(s) 2-4 under 35 U.S.C. 112(b), have been fully considered and are persuasive. Therefore, the rejections have been withdrawn. 
Applicant’s second argument, with respect to the rejection(s) of claim(s) 1, 14, and 28-30 under 35 U.S.C. 103, has been considered but is not persuasive. More specifically, applicant argues that no cited reference teaches the number of amended claim limitations for claims 1, 14, and 28-30, since Milton teaches “determining an amount of training error for each leaf node” or “for an entire tree”, but not “a second prediction quality parameter for each training object” as claimed. The examiner respectfully disagrees.
Due to the broadness of the claim language, Milton has been found to teach the requirements of the claim. Milton teaches a leaf node classifying a training sample in a given iteration and determining the error for the leaf based on the training sample, thus meeting the requirements of the claim language. Milton, paragraphs 0066-0069, 0071-0075, 0080, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector (each training object corresponding to the respective node)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…a node-level prediction parameter)” for “each leaf node” can be computed “at a given iteration” (each training object corresponding to the respective node), and comparing the training errors for pruning leaves “calculated” with different “impurity measures” (generating…a node-level prediction parameter by averaging the second prediction quality parameters of each training object corresponding to the respective node).
Further, Gupta has been cited in alternative for teaching the amended limitation, paragraph 0173 teaches training a decision tree with “training samples” and “once a Decision Tree is generated, an error estimation through this decision tree requires traversing the tree nodes (for each node of the decision tree) utilizing binary decisions based on the respective model parameters and their thresholds. Once an end-node (also called ‘leaf’) of a tree is reached (for each node of the decision tree), the error estimate (e.g., mean value of the error in all of its training samples (generating…a node-level prediction parameter by averaging the second prediction quality parameters of each training object)) associated with the leaf is used for the correction (corresponding to the respective node)”.
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.

Claim Objections
Claim 1 is objected to because of the following informalities:
Claim 1 as amended recites a typo stating “generating, for for each training object”, and an optional way to amend this would read “generating, for .
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 5, 7-10, 12, 14-16, 19-24, 26, and 28-30 are rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Kesin (US Patent 9348920), in view of Gupta et al (US Pub 20190008461) hereinafter Gupta.
Regarding claim 1, Milton teaches a method of determining a first prediction quality parameter for a decision tree in a decision tree prediction model (paragraphs 0020, 0073-0074, 0080, 0086, and 0093 teach a computing system processors with memory, such as “servers”, executing embodiments of the disclosure of “machine learning” techniques including “decision trees” (MLA) for calculating “leaf node” classification “confidence value[s]” or errors for each leaf (method of determining a first prediction quality parameter) of “decision trees” (for a decision tree in a decision tree prediction model)), 
a given level of the decision tree having at least one node (paragraphs 0073-0074 teach decision tree, well known to include at least one layer, (a given level of the decision tree) including with “leaf nodes” (having at least one node)), 
the first prediction quality parameter being for evaluating prediction quality of the decision tree prediction model at a given iteration of training of the decision tree (paragraph 0066, 0073-0074, and 0080 teach for each node of a decision tree, determining decision tree leaf node classification “training error (the first prediction quality parameter being for evaluating prediction quality of the decision tree prediction model) at a given iteration (at a given iteration of training of the decision tree)”), the given iteration of training of the decision tree having at least one previous iteration of training of a previous decision tree, the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error” for “each leaf node” can be computed “at a given iteration”, before proceeding to other training vectors, iterations, and/or decision trees (the given iteration of training of the decision tree having at least one previous iteration of training of a previous decision tree); and further “aggregating the resulting plurality of decision trees to produce an aggregated binary classification decision tree” (the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique));
the method being executable at a machine learning system that executes the decision tree prediction model (paragraphs 0020, 0059, 0064, 0073-0074, 0080, and 0086 teach processor with memory for executing embodiments of the disclosure of “machine learning” techniques (method being executable at a machine learning system) including “decision trees” (that executes the decision tree prediction model)), 
the method comprising: 
accessing, from a non-transitory computer-readable medium of the machine learning system, a set of training objects (paragraphs 0010-0011, 0020, 0090, and 0094 teach storing user data in memory (CRM), and paragraphs 0051, 0056-0059, and 0067 teach “training set” data being collected user data (accessing CRM) including “existing records for users” and represented “as a collection of vectors” (a set of training objects)), each training object of the set of training objects containing an indication of a document and a target associated with the document (paragraphs 0028, 0052, 0055-0056, 0068, 0074, and 0084 teach outputting “articles” from user training data (and a target associated with the document) including “existing records for users”, such as “purchasing history, media viewing history, automotive records, social networking activity, and the like” (each training object of the set of training objects containing an indication of a document) and determining training error at each node when the “status is known” for the training data (and a target associated with the document)); 



descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized, by the decision tree model at the given iteration of training, into a given child node of the at least one node of the given level of the decision tree (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” in at least one layer of the decision tree classifying the training vectors that can be “sorted” (descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized…into a given child node of the at least one node of the given level of the decision tree), through “a plurality of training iterations” and that “training error” for “each leaf node” can be computed “at a given iteration” (by the decision tree model at the given iteration of training)); 
generating, for for each training object of the set of training objects, a second prediction quality parameter based on a target of the respective training object and based on targets of only those training objects that occur before the respective training object in the ordered list of training objects and that have been categorized in a same leaf node as the respective training object (paragraphs 0055, 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying each “vector” of the training vectors (for each training object of the set of training objects) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…a second prediction quality parameter(s))” for “each leaf node (for the respective training object/for each training object of the set of training objects)” can be computed “at a given iteration” when (based on) “audience membership status is known” for the training set example in used in the node (targets of the respective training object), before proceeding to other training vectors and iterations (based on a targets of only those training objects that occur before the respective training object in the ordered list of training objects and that have been categorized in a same leaf node as the respective training object)); 
generating, for each node of the decision tree, a node-level prediction parameter by averaging the second prediction quality parameters of each training object corresponding to the respective node (paragraphs 0066-0069, 0071-0075, 0080, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector (each training object corresponding to the respective node)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…a node-level prediction parameter)” for “each leaf node” can be computed “at a given iteration” (each training object corresponding to the respective node), and comparing the training errors for pruning leaves “calculated” with different “impurity measures” (generating…a node-level prediction parameter by averaging the second prediction quality parameters of each training object corresponding to the respective node)); and
generating the first prediction quality parameter for the decision tree based on the node-level prediction parameters (paragraphs 0055, 0066, 0073-0075, 0080, and 0084 teach determining decision tree leaf node classification “training error (generating the first prediction quality parameter for the decision tree) at a given iteration”, by (based on) “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying each “vector” of the training vectors (each training object of the set of training objects) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (the node-level prediction parameters)” for “each leaf node (for each training object of the set of training objects)” can be computed “at a given iteration” when “audience membership status is known” for the training set examples, before proceeding to other training vectors and iterations (the node-level prediction parameters)).

However Milton does not explicitly teach organizing the set of training objects into an ordered list of training objects, wherein the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: i) a preceding training object that occurs before the given training object and (ii) a subsequent training object that occurs after the given training object.
Kimmel teaches organizing the set of training objects into an ordered list of training objects, wherein the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: 
(i) a preceding training object that occurs before the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.) and 
(ii) a subsequent training object that occurs after the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a subsequent training object that occurs after the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order after it.).
Further, Milton at least implies the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (see mapping above), however Kimmel teaches the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (paragraphs 0006-0007 teach training a decision tree on sorted training data and node threshold values, and then creating a “decision tree ensemble…by repeating the training process as described” for further decision trees (the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Kimmel’s teachings of “sorting the [training] data samples in consecutive ascending order” for decision tree training and ensemble generation into Milton’s teaching of training a decision tree and computing “each leaf node” training error in order to “achieve best results in decision making” through organizing large training datasets (Kimmel, paragraphs 0004-0006).
Further still, Milton at least implies each training object of the set of training object containing an indication of a document and a target associated with the document (see mapping above), however Kesin teaches each training object of the set of training object containing an indication of a document and a target associated with the document (Col. 9, line 58-Col. 10, line 4, Col. 11, line 61-Col. 12, line 42, and Fig. 3 teach “supervised…machine learning” training of a “decision tree”, including document “training data” (each training object of the set of training object containing an indication of a document and a target associated with the document)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, to include “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin in order to specifically train a decision tree on “document” data (Kesin, Col. 9, line 58-Col. 10, line 4).
Further still, Milton at least implies generating, for each node of the decision tree, a node-level prediction parameter by averaging the second prediction quality parameters of each training object corresponding to the respective node (see mapping above), however Gupta teaches generating, for each node of the decision tree, a node-level prediction parameter by averaging the second prediction quality parameters of each training object corresponding to the respective node (paragraph 0173 teaches training a decision tree with “training samples” and “once a Decision Tree is generated, an error estimation through this decision tree requires traversing the tree nodes (for each node of the decision tree) utilizing binary decisions based on the respective model parameters and their thresholds. Once an end-node (also called ‘leaf’) of a tree is reached (for each node of the decision tree), the error estimate (e.g., mean value of the error in all of its training samples (generating…a node-level prediction parameter by averaging the second prediction quality parameters of each training object)) associated with the leaf is used for the correction (corresponding to the respective node)”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, to include calculating a decision tree’s leaf mean error value from all of the leaf’s training samples as taught by Gupta in order to increase accuracy of a prediction from training a decision tree with leaf error calculations (Gupta, abstract and paragraph 0173).

Regarding claim 5, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claim 1 above; and further teach wherein descending comprises: 
descending the set of training objects through the decision tree based on an order of the ordered list of training objects (paragraphs 0066-0068, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” in at least one layer of the decision tree classifying the training vectors that can be “sorted” (descending the set of training objects through the decision tree an order of the ordered list of training objects), through “a plurality of training iterations” and that “training error” for “each leaf node” can be computed “at a given iteration”).

Regarding claims 14 and 29, Milton teaches a method of determining a prediction quality parameter for a decision tree in a decision tree prediction model, and server configured to execute a Machine Learning Algorithm (MLA), the MLA being based on a decision tree prediction model based on a decision tree (paragraphs 0020, 0073-0074, 0080, 0086, and 0093 teach a computing system processors with memory, such as “servers”, executing embodiments of the disclosure of “machine learning” techniques including “decision trees” (MLA) for calculating “leaf node” classification “confidence value[s]” or errors (method of determining a prediction quality parameter) of “decision trees” (for a decision tree in a decision tree prediction model)), 
a given level of the decision tree having at least one node (paragraphs 0073-0074 teach decision tree, well known to include at least one layer, (a given level of the decision tree) including with “leaf nodes” (having at least one node)), 
the prediction quality parameter being for evaluating prediction quality of the decision tree prediction model at a given iteration of training of the decision tree (paragraph 0066, 0073-0074, and 0080 teach determining decision tree leaf node classification “training error (prediction quality parameter being for evaluating prediction quality of the decision tree prediction model) at a given iteration (at a given iteration of training of the decision tree)”), the given iteration of training of the decision tree having at least one previous iteration of training of a previous decision tree, the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error” for “each leaf node” can be computed “at a given iteration”, before proceeding to other training vectors, iterations, and/or decision trees (the given iteration of training of the decision tree having at least one previous iteration of training of a previous decision tree); and further “aggregating the resulting plurality of decision trees to produce an aggregated binary classification decision tree” (the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique));
the method being executable at a machine learning system that executes the decision tree prediction model (paragraphs 0020, 0059, 0064, 0073-0074, 0080, and 0086 teach processor with memory for executing embodiments of the disclosure of “machine learning” techniques (method being executable at a machine learning system) including “decision trees” (that executes the decision tree prediction model)), 
the method comprising: 
accessing, from a non-transitory computer-readable medium of the machine learning system, a set of training objects (paragraphs 0010-0011, 0020, 0090, and 0094 teach storing user data in memory (CRM), and paragraphs 0051, 0056-0059, and 0067 teach “training set” data being collected user data (accessing CRM) including “existing records for users” and represented “as a collection of vectors” (a set of training objects)), each training object of the set of training objects containing an indication of a document and a target associated with the document (paragraphs 0028, 0052, 0055, 0068, 0074, and 0084 teach outputting “articles” from user training data (and a target associated with the document) including “existing records for users”, such as “purchasing history, media viewing history, automotive records, social networking activity, and the like” (each training object of the set of training objects containing an indication of a document) and determining training error at each node (and a target associated with the document)); 



descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized, by the decision tree model at the given iteration of training, into a given child node of the at least one node of the given level of the decision tree (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” in at least one layer of the decision tree classifying the training vectors that can be “sorted” (descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized…into a given child node of the at least one node of the given level of the decision tree), through “a plurality of training iterations” and that “training error” for “each leaf node” can be computed “at a given iteration” (by the decision tree model at the given iteration of training)); 
generating the prediction quality parameter for the decision tree by: 
generating, for each training object of the set of training objects, a prediction quality approximation parameter based on a target of the respective training object, based on targets of only those training objects that occur before the respective training object in the ordered list of training objects and that have been categorized in a same leaf node as the respective training object (paragraphs 0055, 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying each “vector” of the training vectors (for each training object of the set of training objects) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…a prediction quality approximation parameter(s))” for “each leaf node (for the respective training object/for each training object of the set of training objects)” can be computed “at a given iteration” when (based on) “audience membership status is known” for the training set example in used in the node (targets of the respective training object), before proceeding to other training vectors and iterations (based on targets of only those training objects that occur before the respective training object in the ordered list of training objects and that have been categorized in a same leaf node as the respective training object)), and based on at least one prediction quality approximation parameter of the respective training object generated during the previous iteration of the training of the previous decision tree (paragraphs 0055, 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (at least one prediction quality approximation parameter)” for “each leaf node (of the respective training object)” can be computed “at a given iteration”, before proceeding to other training vectors, iterations, and/or decision trees (generated during the previous iteration of the training of the previous decision tree)); 
generating, for each node of the decision tree, a node-level prediction parameter by averaging the prediction quality approximation parameters of each training object corresponding to the respective node (paragraphs 0066-0069, 0071-0075, 0080, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector (each training object corresponding to the respective node)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…a node-level prediction parameter)” for “each leaf node” can be computed “at a given iteration” (each training object corresponding to the respective node), and comparing the training errors for pruning leaves “calculated” with different “impurity measures” (generating…a node-level prediction parameter by averaging the sec prediction quality approximation parameters of each training object corresponding to the respective node)); and
generating the prediction quality parameter for the given level of the decision tree based on the node-level prediction parameters (paragraphs 0055, 0066, 0073-0075, 0080, and 0084 teach determining decision tree leaf node classification “training error (generating the prediction quality parameter for the given level of the decision tree) at a given iteration”, well known to be located at one layer of a decision tree (for the given level of the decision tree), by (based on) “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying each “vector” of the training vectors (each training object of the set of training objects) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (the node-level parameter)” for “each leaf node (for each training object of the set of training objects)” can be computed “at a given iteration” when “audience membership status is known” for the training set examples, before proceeding to other training vectors and iterations (the node-level parameter of each training object of the set of training objects)).

However Milton does not explicitly teach organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: i) a preceding training object that occurs before the given training object and (ii) a subsequent training object that occurs after the given training object.
Kimmel teaches organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: 
(i) a preceding training object that occurs before the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.) and 
(ii) a subsequent training object that occurs after the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a subsequent training object that occurs after the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order after it.).
Further, Milton at least implies the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (see mapping above), however Kimmel teaches the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (paragraphs 0006-0007 teach training a decision tree on sorted training data and node threshold values, and then creating a “decision tree ensemble…by repeating the training process as described” for further decision trees (the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Kimmel’s teachings of “sorting the [training] data samples in consecutive ascending order” for decision tree training and ensemble generation into Milton’s teaching of training a decision tree and computing “each leaf node” training error in order to “achieve best results in decision making” through organizing large training datasets (Kimmel, paragraphs 0004-0006).
Further still, Milton at least implies each training object of the set of training object containing an indication of a document and a target associated with the document (see mapping above), however Kesin teaches each training object of the set of training object containing an indication of a document and a target associated with the document (Col. 9, line 58-Col. 10, line 4, Col. 11, line 61-Col. 12, line 42, and Fig. 3 teach “supervised…machine learning” training of a “decision tree”, including document “training data” (each training object of the set of training object containing an indication of a document and a target associated with the document)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, to include “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin in order to specifically train a decision tree on “document” data (Kesin, Col. 9, line 58-Col. 10, line 4).
Further still, Milton at least implies generating, for each node of the decision tree, a node-level prediction parameter by averaging the prediction quality approximation parameters of each training object corresponding to the respective node (see mapping above), however Gupta teaches generating, for each node of the decision tree, a node-level prediction parameter by averaging the prediction quality approximation parameters of each training object corresponding to the respective node (paragraph 0173 teaches training a decision tree with “training samples” and “once a Decision Tree is generated, an error estimation through this decision tree requires traversing the tree nodes (for each node of the decision tree) utilizing binary decisions based on the respective model parameters and their thresholds. Once an end-node (also called ‘leaf’) of a tree is reached (for each node of the decision tree), the error estimate (e.g., mean value of the error in all of its training samples (generating…a node-level prediction parameter by averaging the prediction quality approximation parameters of each training object)) associated with the leaf is used for the correction (corresponding to the respective node)”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, to include calculating a decision tree’s leaf mean error value from all of the leaf’s training samples as taught by Gupta in order to increase accuracy of a prediction from training a decision tree with leaf error calculations (Gupta, abstract and paragraph 0173).

Regarding claims 7 and 21, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claims 1 and 14 above; and further teach wherein the organizing the set of training objects into an ordered list of training objects comprises: 
generating a plurality of ordered lists of training objects, each of the plurality of ordered lists of training objects being organized such that for each given training object in the ordered list of training objects there is at least one of: 
(i) a preceding training object that occurs before the given training object (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets (lists of training objects), each corresponding to one of a plurality of attributes. The data subsets (lists of training objects) are distributed to a plurality of slave processing units after sorting the data samples (generating a plurality of ordered lists of training objects, each of the plurality of ordered lists of training objects being organized such that for each given training object in the ordered list of training objects) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (there is at least one of: (i) a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.) and 
(ii) a subsequent training object that occurs after the given training object (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets (lists of training objects), each corresponding to one of a plurality of attributes. The data subsets (lists of training objects) are distributed to a plurality of slave processing units after sorting the data samples (generating a plurality of ordered lists of training objects, each of the plurality of ordered lists of training objects being organized such that for each given training object in the ordered list of training objects) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (there is at least one of: (ii) a subsequent training object that occurs after the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order after it); 
a given one of the plurality of ordered lists of training objects having at least a partially different order from others of the plurality of ordered lists of training objects (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets (lists of training objects), each corresponding to one of a plurality of attributes. The data subsets (lists of training objects) are distributed to a plurality of slave processing units after sorting the data samples (a given one of the plurality of ordered lists of training objects having at least a partially different order from others of the plurality of ordered lists of training objects) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level. In other words, different organized training data sample subsets are distributed to different sources (a given one of the plurality of ordered lists of training objects having a least partially different order from others of the plurality of ordered lists of training objects).).
Milton, Kimmel, Kesin, and Gupta are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claims 8 and 22, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claims 7 and 21 above; and further teach selecting the given one of the plurality of ordered lists of training objects (Kimmel, paragraphs 0006, 0017, 0020, and 0059 teach using a subset of the training data sample subsets for training a decision tree (selecting the given one of the plurality of ordered lists of training objects)).
Milton, Kimmel, Kesin, and Gupta are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claims 9 and 23, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claims 8 and 22 above; and further teach wherein the selecting is done for each iteration of generating of the first [claim 9] prediction quality parameter (Kimmel, paragraphs 0006, 0017, 0020, and 0059 teach using a subset of the training data sample subsets for training a decision tree for comparing results to a node’s “threshold” (generating of the (first) prediction quality parameter), and that this occurs “During each tree level [training] iteration” (wherein the selecting is done for each iteration of generating of the (first) prediction quality parameter)).
Milton, Kimmel, Kesin, and Gupta are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claims 10 and 24, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claims 8 and 22 above; and further teach wherein the selecting is done in an entirety of a process of verification of prediction quality for a given decision tree (Kimmel, paragraphs 0006, 0017, 0020, and 0059 teach using a subset of the training data sample subsets for training a decision tree for comparing results to a node’s “threshold” (wherein the selecting is done an entirety of a process of verification of prediction quality for a given decision tree), and that this occurs “During each tree level [training] iteration” (wherein the selecting is done in an entirety of a process of verification of prediction quality for a given decision tree)).
Milton, Kimmel, Kesin, and Gupta are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claims 12 and 26, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claims 1 and 14 above; and further teach wherein the set of training objects is not associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in accordance with a rule (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes (set of training objects is not associated with an inherent temporal relationship of training objects). The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects) in consecutive ascending order (in accordance with a rule) by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level”).
Milton, Kimmel, Kesin, and Gupta are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claim 15, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claim 14 above; and further teach wherein the method further comprises calculating, for each respective training object, an indication of the at least one quality approximation parameter of the respective training object generated during the previous iteration of the training of the previous decision tree (Milton, paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for each of the tree’s “leaf nodes” for classifying “a vector” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (calculating, for each respective training object for each respective training object, an indication of the at least one quality approximation parameter of the given training object)” for “each leaf node (of the respective training object)” can be computed “at a given iteration”, before proceeding to other training vectors, iterations, and/or decision trees (generated during the previous iteration of the training of the previous decision tree)).

Regarding claim 16, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claim 15 above; and further teach wherein the calculating comprises: 
splitting the ordered list of training objects into a plurality of chunks, the plurality of chunks being organized into at least two levels of chunks (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets (splitting the ordered list of training objects into a plurality of chunks), each corresponding to one of a plurality of attributes. The data subsets (by splitting the ordered list of training objects into a plurality of chunks) are distributed to a plurality of slave processing units after sorting the data samples (the plurality of chunks being organized into at least two levels of chunks) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level” and comparing to a node’s “threshold”, In other words, different organized training data sample subsets are distributed to different sources (the plurality of chunks being organized into at least two levels of chunks).).
Milton, Kimmel, Kesin, and Gupta are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claim 19, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claim 16 above; and further teach wherein the calculating the indication of the at least one quality approximation parameter of the respective training object generated during the previous iteration of the training of the previous decision tree comprises: 
for the respective training object, calculating at least one quality approximation parameter based on the training objects located in the same chunk as the respective training object (Milton, paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector” (of the respective training object) of the training vectors that can be “sorted” (based on the training objects located in the same chunk as the respective training object), during a training iteration of “a plurality of training iterations”, and computing “training error” (calculating the indication of the at least one quality approximation parameter) for “each leaf node” can be computed “at a given iteration”, before proceeding to other training vectors, iterations, and/or decision trees (generated during the previous iteration of the training of the previous decision tree comprises)).

Regarding claim 28, Milton teaches a server configured to execute a Machine Learning Algorithm (MLA), the MLA being based on a decision tree prediction model based on a decision tree (paragraphs 0020, 0073-0074, 0080, 0086, and 0093 teach a computing system processors with memory, such as “servers”, executing embodiments of the disclosure of “machine learning” techniques including “decision trees” (MLA) for calculating “leaf node” classification “confidence value[s]” or errors for each leaf (method of determining a first prediction quality parameter) of “decision trees” (for a decision tree in a decision tree prediction model)), a given level of the decision tree having at least one node (paragraphs 0073-0074 teach decision tree, well known to include at least one layer, (a given level of the decision tree) including with “leaf nodes” (having at least one node)), the server further configured to:
access, from a non-transitory computer-readable medium of the server, a set of training objects (paragraphs 0010-0011, 0020, 0090, and 0094 teach storing user data in memory in a server (CRM), and paragraphs 0051, 0056-0059, and 0067 teach “training set” data being collected user data (accessing server) including “existing records for users” and represented “as a collection of vectors” (a set of training objects)), each training object of the set of training objects containing an indication of a document and a target associated with the document (paragraphs 0028, 0052, 0055, 0068, 0074, and 0084 teach outputting “articles” from user training data (and a target associated with the document) including “existing records for users”, such as “purchasing history, media viewing history, automotive records, social networking activity, and the like” (each training object of the set of training objects containing an indication of a document) and determining training error at each node (and a target associated with the document)); 



descend the set of training objects through the decision tree so that each one of the set of training objects gets categorized, by the decision tree model at the given iteration of training, into a given child node of the at least one node of the given level of the decision tree (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” in at least one layer of the decision tree classifying the training vectors that can be “sorted” (descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized…into a given child node of the at least one node of the given level of the decision tree), through “a plurality of training iterations” and that “training error” for “each leaf node” can be computed “at a given iteration” (by the decision tree model at the given iteration of training)); 
generate, for each training object of the set of training objects, a first prediction quality parameter for the respective training object of the set based on a target of the respective training object and based on targets of a subset of the set of training objects, wherein the subset of the set of training objects includes training objects that occur before the given training object in the ordered list of training objects and that have been categorized in a same leaf node as the respective training object (paragraphs 0055, 0066, 0073-0075, 0084 and claims 5 & 8-9 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying each “vector” of the training vectors (for each training object of the set of training objects) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…first prediction quality parameter)” for “each leaf node (for the respective training object/for each training object of the set of training objects)” can be computed “at a given iteration” when (based on) “audience membership status is known” for the training set examples or a selected “subset of the training set” (a target of the respective training object), before proceeding to other training vectors and iterations (based on targets of a subset of the set of training objects, wherein the subset of the set of training objects includes training objects that occur before the given training object in the ordered list of training objects and that have been categorized in a same leaf node as the respective training object)); 
generate, for each node of the decision tree, a node-level prediction parameter by averaging the first prediction quality parameters of each training object corresponding to the respective node (paragraphs 0066-0069, 0071-0075, 0080, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector (each training object corresponding to the respective node)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…a node-level prediction parameter)” for “each leaf node” can be computed “at a given iteration” (each training object corresponding to the respective node), and comparing the training errors for pruning leaves “calculated” with different “impurity measures” (generating…a node-level prediction parameter by averaging the first prediction quality parameters of each training object corresponding to the respective node)); and
generate a second prediction quality parameter for the given level of the decision tree based on the node-level prediction parameters (paragraphs 0055, 0066, 0073-0075, 0080, and 0084 teach determining decision tree leaf node classification “training error (generating a second prediction quality parameter for the given level of the decision tree) at a given iteration”, well known to be located at one layer of a decision tree (for the given level of the decision tree), by (based on) “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying each “vector” of the training vectors (each training object of the set of training objects) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (the node-level prediction parameter)” for “each leaf node (for each training object of the set of training objects)” can be computed “at a given iteration” when “audience membership status is known” for the training set examples, before proceeding to other training vectors and iterations (the node-level prediction quality parameters)).

However Milton does not explicitly teach organizing the set of training objects into an ordered list of training objects, wherein the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: i) a preceding training object that occurs before the given training object and (ii) a subsequent training object that occurs after the given training object.
Kimmel teaches organizing the set of training objects into an ordered list of training objects, wherein the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: 
(i) a preceding training object that occurs before the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.) and 
(ii) a subsequent training object that occurs after the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a subsequent training object that occurs after the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order after it.).
Further, Milton at least implies each training object of the set of training object containing an indication of a document and a target associated with the document (see mapping above), however Kesin teaches each training object of the set of training object containing an indication of a document and a target associated with the document (Col. 9, line 58-Col. 10, line 4, Col. 11, line 61-Col. 12, line 42, and Fig. 3 teach “supervised…machine learning” training of a “decision tree”, including document “training data” (each training object of the set of training object containing an indication of a document and a target associated with the document)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, to include “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin in order to specifically train a decision tree on “document” data (Kesin, Col. 9, line 58-Col. 10, line 4).
Further still, Milton at least implies generate, for each node of the decision tree, a node-level prediction parameter by averaging the first prediction quality parameters of each training object corresponding to the respective node (see mapping above), however Gupta teaches generate, for each node of the decision tree, a node-level prediction parameter by averaging the first prediction quality parameters of each training object corresponding to the respective node (paragraph 0173 teaches training a decision tree with “training samples” and “once a Decision Tree is generated, an error estimation through this decision tree requires traversing the tree nodes (for each node of the decision tree) utilizing binary decisions based on the respective model parameters and their thresholds. Once an end-node (also called ‘leaf’) of a tree is reached (for each node of the decision tree), the error estimate (e.g., mean value of the error in all of its training samples (generating…a node-level prediction parameter by averaging the first prediction quality parameters of each training object)) associated with the leaf is used for the correction (corresponding to the respective node)”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, to include calculating a decision tree’s leaf mean error value from all of the leaf’s training samples as taught by Gupta in order to increase accuracy of a prediction from training a decision tree with leaf error calculations (Gupta, abstract and paragraph 0173).

Regarding claim 30, Milton teaches a method of determining a prediction quality parameter for a decision tree in a decision tree prediction model (paragraphs 0020, 0073-0074, 0080, 0086, and 0093 teach a computing system processors with memory, such as “servers”, executing embodiments of the disclosure of “machine learning” techniques including “decision trees” for calculating “leaf node” classification “confidence value[s]” or errors (method of determining a prediction quality parameter) of “decision trees” (for a decision tree in a decision tree prediction model)), 
a given level of the decision tree having at least one node (paragraphs 0073-0074 teach decision tree, well known to include at least one layer, (a given level of the decision tree) including with “leaf nodes” (having at least one node)), 
the prediction quality parameter being for evaluating prediction quality of the decision tree prediction model at a given iteration of training of the decision tree (paragraph 0066, 0073-0074, and 0080 teach determining decision tree leaf node classification “training error (prediction quality parameter being for evaluating prediction quality of the decision tree prediction model) at a given iteration (at a given iteration of training of the decision tree)”), the given iteration of training of the decision tree having at least one previous iteration of training of a previous decision tree, the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error” for “each leaf node” can be computed “at a given iteration”, before proceeding to other training vectors, iterations, and/or decision trees (the given iteration of training of the decision tree having at least one previous iteration of training of a previous decision tree); and further “aggregating the resulting plurality of decision trees to produce an aggregated binary classification decision tree” (the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique));
the method being executable at a machine learning system that executes the decision tree prediction model (paragraphs 0020, 0059, 0064, 0073-0074, 0080, and 0086 teach processor with memory for executing embodiments of the disclosure of “machine learning” techniques (method being executable at a machine learning system) including “decision trees” (that executes the decision tree prediction model)), 
the method comprising: 
accessing, from a non-transitory computer-readable medium of the machine learning system, a set of training objects (paragraphs 0010-0011, 0020, 0090, and 0094 teach storing user data in memory (CRM), and paragraphs 0051, 0056-0059, and 0067 teach “training set” data being collected user data (accessing CRM) including “existing records for users” and represented “as a collection of vectors” (a set of training objects)), each training object of the set of training object containing an indication of a document and a target associated with the document (paragraphs 0028, 0052, 0055, 0068, 0074, and 0084 teach outputting “articles” from user training data (and a target associated with the document) including “existing records for users”, such as “purchasing history, media viewing history, automotive records, social networking activity, and the like” (each training object of the set of training object containing an indication of a document) and determining training error at each node (and a target associated with the document)); 



descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized, by the decision tree model at the given iteration of training, into a given child node of the at least one node of the given level of the decision tree (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” in at least one layer of the decision tree classifying the training vectors that can be “sorted” (descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized…into a given child node of the at least one node of the given level of the decision tree), through “a plurality of training iterations” and that “training error” for “each leaf node” can be computed “at a given iteration” (by the decision tree model at the given iteration of training)); 
generating the prediction quality parameter for the decision tree by: 
generating, for each training object of the set of training objects, a prediction quality approximation parameter based on a target of the respective training object, based on targets of only those training objects that occur before the respective training object in the ordered list of training objects and that have been categorized in a same leaf node as the respective training object (paragraphs 0055, 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying each “vector” of the training vectors (for each training object of the set of training objects) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…a prediction quality approximation parameter(s))” for “each leaf node (for the respective training object/for each training object of the set of training objects)” can be computed “at a given iteration” when (based on) “audience membership status is known” for the training set example in used in the node (targets of the respective training object), before proceeding to other training vectors and iterations (based on targets of only those training objects that occur before the respective training object in the ordered list of training objects and that have been categorized in a same leaf node as the respective training object)), based on at least one prediction quality approximation parameter of the respective training object generated during the previous iteration of the training of the previous decision trees (paragraphs 0055, 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (at least one prediction quality approximation parameter)” for “each leaf node (of the respective training object)” can be computed “at a given iteration”, before proceeding to other training vectors, iterations, and/or decision trees (generated during the previous iteration of the training of the previous decision trees)); 
generating, for each node of the decision tree, a node-level prediction parameter by averaging the prediction quality approximation parameters of each training object corresponding to the respective node (paragraphs 0066-0069, 0071-0075, 0080, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector (each training object corresponding to the respective node)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…a node-level prediction parameter)” for “each leaf node” can be computed “at a given iteration” (each training object corresponding to the respective node), and comparing the training errors for pruning leaves “calculated” with different “impurity measures” (generating…a node-level prediction parameter by averaging the sec prediction quality approximation parameters of each training object corresponding to the respective node)); and
generating the prediction quality parameter for the given level of the decision tree based on the node-level prediction parameters (paragraphs 0055, 0066, 0073-0075, 0080, and 0084 teach determining decision tree leaf node classification “training error (generating the prediction quality parameter for the given level of the decision tree) at a given iteration”, well known to be located at one layer of a decision tree (for the given level of the decision tree), by (based on) “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying each “vector” of the training vectors (each training object of the set of training objects) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (the node-level prediction parameters)” for “each leaf node (for each training object of the set of training objects)” can be computed “at a given iteration” when “audience membership status is known” for the training set examples, before proceeding to other training vectors and iterations (the node-level prediction parameters)).

However Milton does not explicitly teach organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: i) a preceding training object that occurs before the given training object and (ii) a subsequent training object that occurs after the given training object; and ...by splitting the ordered list of training objects into a plurality of chunks, the plurality of chunks being organized into at least two levels of chunks.
Kimmel teaches organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: 
(i) a preceding training object that occurs before the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.) and 
(ii) a subsequent training object that occurs after the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a subsequent training object that occurs after the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order after it.), and
…by splitting the ordered list of training objects into a plurality of chunks, the plurality of chunks being organized into at least two levels of chunks (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets (by splitting the ordered list of training objects into a plurality of chunks), each corresponding to one of a plurality of attributes. The data subsets (by splitting the ordered list of training objects into a plurality of chunks) are distributed to a plurality of slave processing units after sorting the data samples (the plurality of chunks being organized into at least two levels of chunks) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level.” In other words, different organized training data sample subsets are distributed to different sources (the plurality of chunks being organized into at least two levels of chunks).).
Further, Milton at least implies the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (see mapping above), however Kimmel teaches the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (paragraphs 0006-0007 teach training a decision tree on sorted training data and node threshold values, and then creating a “decision tree ensemble…by repeating the training process as described” for further decision trees (the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Kimmel’s teachings of “sorting the [training] data samples in consecutive ascending order” for decision tree training and ensemble generation into Milton’s teaching of training a decision tree and computing “each leaf node” training error in order to “achieve best results in decision making” through organizing large training datasets (Kimmel, paragraphs 0004-0006).
Further still, Milton at least implies each training object of the set of training object containing an indication of a document and a target associated with the document (see mapping above), however Kesin teaches each training object of the set of training object containing an indication of a document and a target associated with the document (Col. 9, line 58-Col. 10, line 4, Col. 11, line 61-Col. 12, line 42, and Fig. 3 teach “supervised…machine learning” training of a “decision tree”, including document “training data” (each training object of the set of training object containing an indication of a document and a target associated with the document)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, to include “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin in order to specifically train a decision tree on “document” data (Kesin, Col. 9, line 58-Col. 10, line 4).
Further still, Milton at least implies generating, for each node of the decision tree, a node-level prediction parameter by averaging the prediction quality approximation parameters of each training object corresponding to the respective node (see mapping above), however Gupta teaches generating, for each node of the decision tree, a node-level prediction parameter by averaging the prediction quality approximation parameters of each training object corresponding to the respective node (paragraph 0173 teaches training a decision tree with “training samples” and “once a Decision Tree is generated, an error estimation through this decision tree requires traversing the tree nodes (for each node of the decision tree) utilizing binary decisions based on the respective model parameters and their thresholds. Once an end-node (also called ‘leaf’) of a tree is reached (for each node of the decision tree), the error estimate (e.g., mean value of the error in all of its training samples (generating…a node-level prediction parameter by averaging the prediction quality approximation parameters of each training object)) associated with the leaf is used for the correction (corresponding to the respective node)”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, to include calculating a decision tree’s leaf mean error value from all of the leaf’s training samples as taught by Gupta in order to increase accuracy of a prediction from training a decision tree with leaf error calculations (Gupta, abstract and paragraph 0173).

Claims 11, 17, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Kesin (US Patent 9348920), in view of Gupta et al (US Pub 20190008461) hereinafter Gupta, in view of Fano et al (US Pub 20050189415) hereinafter Fano.
Regarding claims 11 and 25, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claims 1 and 14 above, however the combination does not explicitly teach wherein the set of training objects is associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in accordance with the temporal relationship.
Fano teaches wherein the set of training objects is associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in accordance with the temporal relationship (paragraphs 0159-0160 teach training decision tree classifier with a training set in which the examples are in “temporal order” (wherein the set of training objects is associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in accordance with the temporal relationship)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, as modified by calculating a decision tree’s leaf mean error value from all of the leaf’s training samples as taught by Gupta, to include training decision tree classifier with a training set in which the examples are in “temporal order” as taught by Fano in order to increase accuracy of a prediction based on times related to specific training data (Fano, paragraphs 0052 and 0160).

Regarding claim 17, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claim 16 above. However while Kimmel does teach dividing data used for training into subsets, the combination does not explicitly teach wherein a chunk of a given level of chunks contains a first pre-defined number of training objects and wherein a chunk of a lower level of chunks contains a different pre-defined number of training objects, the different pre-defined number of training objects being greater than the first pre-defined number of training objects.
Fano teaches wherein a chunk of a given level of chunks contains a first pre-defined number of training objects and wherein a chunk of a lower level of chunks contains a different pre-defined number of training objects, the different pre-defined number of training objects being greater than the first pre-defined number of training objects (paragraphs 0159-0160 teaches “the example sets (training objects) were split into a training set, which included the first 80% of examples in temporal order (and wherein a chunk of a lower level of chunks contains a different pre-defined number of training objects, the different pre-defined number of training objects being greater than the first pre-defined number of training objects), and a test set, which included the last 20% (wherein a chunk of a given level of chunks contains a first pre-defined number of training objects)” for a decision tree classifier).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, as modified by calculating a decision tree’s leaf mean error value from all of the leaf’s training samples as taught by Gupta, to include training decision tree classifier with a training set in which the examples are in “temporal order” as taught by Fano in order to increase accuracy of a prediction based on times related to specific training data (Fano, paragraphs 0052 and 0160).

Claims 13 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Kesin (US Patent 9348920), in view of Gupta et al (US Pub 20190008461) hereinafter Gupta, in view of Hong et al (US Pub 20040111169) hereinafter Hong.
Regarding claims 13 and 27, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claims 1 and 14 above, however the combination does not explicitly teach wherein the set of training objects is not associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in a randomly generated order.
Hong teaches wherein the set of training objects is not associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in a randomly generated order (paragraphs 0027 and 0061 teaches training models including “decision trees” on a “randomly permuted” order of “training set” with no timestamps (wherein the set of training objects is not associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in a randomly generated order)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, as modified by calculating a decision tree’s leaf mean error value from all of the leaf’s training samples as taught by Gupta, to include training models including “decision trees” on a “randomly permuted” order of “training set” as taught by Hong in order to improve “class probability estimation” and “determine the optimal value” of a prediction model (Hong, paragraphs 0008-0009 and 0061).

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Kesin (US Patent 9348920), in view of Gupta et al (US Pub 20190008461) hereinafter Gupta, in view of Lee et al (US Patent 5978497) hereinafter Lee.
Regarding claim 18, the combination of Milton, Kimmel, Kesin, and Gupta teach all the claim limitations of claim 16 above. However while Kimmel does teach dividing data used for training into subsets, the combination does not explicitly teach wherein a chunk of a given level of chunks contains a first pre-defined number of training objects and wherein a chunk of a lower level of chunks contains the first pre-defined number of training objects and a second set of training objects located sequentially after the first pre-defined number of training objects in the ordered list, a number of training objects within the second set of training objects being the same as the first pre-defined number of training objects.
Lee teaches wherein a chunk of a given level of chunks contains a first pre-defined number of training objects and wherein a chunk of a lower level of chunks contains the first pre-defined number of training objects and a second set of training objects located sequentially after the first pre-defined number of training objects in the ordered list, a number of training objects within the second set of training objects being the same as the first pre-defined number of training objects (Col.30, lines 40-43 and Col. 34, line 66-Col. 35, line 15 teach a decision tree’s “training data is randomly divided into five equal sets” (wherein a chunk of a given level of chunks contains a first pre-defined number of training objects and wherein a chunk of a lower level of chunks contains the first pre-defined number of training objects and a second set of training objects located sequentially after the first pre-defined number of training objects in the ordered list, a number of training objects within the second set of training objects being the same as the first pre-defined number of training objects)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, as modified by calculating a decision tree’s leaf mean error value from all of the leaf’s training samples as taught by Gupta, to include dividing training data “into five equal sets” as taught by Lee in order to track and improve performance of the classifier (Lee, Col. 34, line 66-Col. 35, line 15).

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Chickering et al (US Patent 7930353) teaches generating decision trees with leaf nodes with corresponding decision confidence thresholds.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123