Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is a responsive to the application filed on 06/05/2018.
Claims 1-30 are pending.
Claims 1-30 are rejected.

Information Disclosure Statement
The information disclosure statement (IDS) submitted by Applicant on 06/08/2018 was considered. However, regarding the references Foulds “Learning Instance Weights in Multi-Instance Learning”, JMP, A Business Unit of SAS “Modeling and Multivariate Methods”, and Rupp et al. “Kernel Methods for Virtual Screening”, it is noted that only a cursory consideration was given to said references in view of the extensive length and scope.

Claim Objections
Claims 1, 5, 10, 14, 24, and 28-30 are objected to because of the following informalities:
Claims 1, 14, 28-30 recite a typos stating “each training object of the set of training object containing”, and an optional way to amend this would read “each training object of the set of training objects
Claim 5 recites typos, “descending the set of training objects through the decision tree in an order of training object of the ordered list of training objects” and requires correction.
Claims 10 and 24 recite a typo stating “done an entirety of a process”, and an optional way to amend this would read “done in an entirety of a process”.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-30 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claims 1, 14, 28, 29, and 30 recite the limitation “generating, for a given training object that has been categorized into the given child node, a prediction quality parameter, the generating being executed based on targets of only those training objects that occur before the given training object in the ordered list of training objects”, but it is unclear to the examiner how a “prediction quality parameter” is generated for a “given training object” without using the “the given training object”, only those training objects that occur before the given training object” are used in creating in this process.
The dependent claims 2-13 and 15-27 are also subsequently rejected.

Claim 3 recites the limitation "prediction quality parameters of the at least one training object” three times in lines 12-14 but it is unclear to the examiner if this is meant to refer to the “prediction quality parameters” in line 10 or different “prediction quality parameters”. Applicant may overcome this rejection by optionally amending to state “the prediction quality parameters of the at least one training object” in lines 12-14.

Claim 6 recites the limitation “generating the prediction quality parameter based on targets of only those training objects that (i) occur on a position before the given position of the given training object in the ordered list of training objects and (ii) have been categorized in a same leaf”, but it is unclear to the examiner what is specifically meant by the claimed “only” in the sense that the prediction quality parameter is determined only by both “(i)” objects before the given object and “(ii)” , when it has already been claimed to be determined by “only those training objects that occur before the given training object” in claim 1. Applicant may overcome this rejection by optionally deleting the term “only” from the claims.

Claims 7 and 21 recite the limitation “training objects having a least partially different order from others” in line10-11 (claim 6) and it is unclear to the Examiner if this means a lowest partial order, and what exactly that would include as determining a t least a partially different order from others”.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-10, 12, 14-16, 19-24, 26, and 28-30 are rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Kesin (US Patent 9348920).
Regarding claims 1, 14, 28, and 29, Milton teaches a method of determining a prediction quality parameter for a decision tree in a decision tree prediction model, and server configured to execute a Machine Learning Algorithm (MLA), the MLA being based on a decision tree prediction model based on a decision tree (paragraphs 0020, 0073-0074, 0080, 0086, and 0093 teach a computing system processors with memory, such as “servers”, executing embodiments of the disclosure of “machine learning” techniques including “decision trees” (MLA) for calculating “leaf node” classification “confidence value[s]” or errors (method of determining a prediction quality parameter) of “decision trees” (for a decision tree in a decision tree prediction model)), 
a given level of the decision tree having at least one node (paragraphs 0073-0074 teach decision tree, well known to include at least one layer, (a given level of the decision tree) including with “leaf nodes” (having at least one node)), 
the prediction quality parameter being for evaluating prediction quality of the decision tree prediction model at a given iteration of training of the decision tree (paragraph 0066, 0073-0074, and 0080 teach determining decision tree leaf node classification “training error (prediction quality parameter being for evaluating prediction quality of the decision tree prediction model) at a given iteration (at a given iteration of training of the decision tree)”), the given iteration of training of the decision tree having at least one previous iteration of training of a previous decision tree, the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf ;
the method being executable at a machine learning system that executes the decision tree prediction model (paragraphs 0020, 0059, 0064, 0073-0074, 0080, and 0086 teach processor with memory for executing embodiments of the disclosure of “machine learning” techniques (method being executable at a machine learning system) including “decision trees” (that executes the decision tree prediction model)), 
the method comprising: 
accessing, from a non-transitory computer-readable medium of the machine learning system, a set of training objects (paragraphs 0010-0011, 0020, 0090, and 0094 teach storing user data in memory (CRM), and paragraphs 0051, 0056-0059, and 0067 teach “training set” data being collected user data (accessing CRM) including “existing records for users” and represented “as a collection of vectors” (a set of training objects)), each training object of the set of training object containing an indication of a document and a target associated with the document (paragraphs 0028, 0052, 0055, 0068, 0074, and 0084 teach outputting “articles” from user training data (and a target associated ; 



descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized, by the decision tree model at the given iteration of training, into a given child node of the at least one node of the given level of the decision tree (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” in at least one layer of the decision tree classifying the training vectors that can be “sorted” (descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized…into a given child node of the at least one node of the given level of the decision tree), through “a plurality of training iterations” and that ; 
generating the prediction quality parameter for the decision tree by: 
generating, for a given training object that has been categorized into the given child node, a prediction quality approximation parameter, the generating being executed based on targets of only those training objects that occur before the given training object in the ordered list of training objects (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector (for a given training object that has been categorized into the given child node)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…a prediction quality approximation parameter)” for “each leaf node (for a given training object that has been categorized into the given child node)” can be computed “at a given iteration”, before proceeding to other training vectors and iterations (the generating being executed based on targets of only those training objects that occur before the given training object in the ordered list of training objects)); and
at least one prediction quality approximation parameter of the given training object generated during the previous iteration of the training of the previous decision tree (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the .

However Milton does not explicitly teach organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: i) a preceding training object that occurs before the given training object and (ii) a subsequent training object that occurs after the given training object.
Kimmel teaches organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: 
(i) a preceding training object that occurs before the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.) and 
(ii) a subsequent training object that occurs after the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a subsequent training object that occurs after the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order after it.).
the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (see mapping above), however Kimmel teaches the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (paragraphs 0006-0007 teach training a decision tree on sorted training data and node threshold values, and then creating a “decision tree ensemble…by repeating the training process as described” for further decision trees (the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Kimmel’s teachings of “sorting the [training] data samples in consecutive ascending order” for decision tree training and ensemble generation into Milton’s teaching of training a decision tree and computing “each leaf node” training error in order to “achieve best results in decision making” through organizing large training datasets (Kimmel, paragraphs 0004-0006).
Further still, Milton at least implies each training object of the set of training object containing an indication of a document and a target associated with the document (see mapping above), however Kesin teaches each training object of the set of training object containing an indication of a document and a target associated with the document (Col. 9, line 58-Col. 10, line 4, Col. 11, line 61-Col. 12, line 42, and Fig. 3 teach “supervised…machine learning” training of a “decision tree”, including document “training data” (each training object of the set of training object containing an indication of a document and a target associated with the document)).


Regarding claim 2, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 1 above; and further teach for a given node having at least one training object categorized in the child node of the given node: 
amalgamating, into a node-level prediction quality prediction parameter, the prediction quality parameters of the at least one training object (Milton, paragraphs 0066-0068, 0071-0075, 0080, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector (for a given training object that has been categorized into the given child node)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (into a node-level prediction quality prediction parameter)” for “each leaf node” can be computed “at a given iteration”, and comparing the training errors for pruning leaves (amalgamating, into a node-level prediction quality prediction parameter, the prediction quality parameters of the at least one training object)).

Regarding claim 3, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 2 above; and further teach wherein the amalgamating, into the node-level prediction quality prediction parameter, the prediction quality parameters of the at least one training object comprises one of: 
adding all prediction quality parameters of the at least one training object, generating an average of prediction quality parameters of the at least one training object and applying a formula to prediction quality parameters of the at least one training object (Milton, paragraphs 0066-0068, 0071-0075, 0080, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector (for a given training object that has been categorized into the given child node)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (into a node-level prediction quality prediction parameter)” for “each leaf node” can be computed “at a given iteration”, and comparing the training errors for pruning leaves (amalgamating, into a node-level prediction quality prediction parameter, the prediction quality parameters of the at least one training object comprises one of…applying a formula to prediction quality parameters of the at least one training object)).

Regarding claim 4, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 1 above; and further teach for the given level of the decision tree, the given level having at least one node, amalgamating into a total-level prediction quality parameter, node level quality prediction parameter the prediction quality parameters of the at least one node (Milton, paragraphs 0066-.

Regarding claim 5, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 1 above; and further teach wherein descending comprises: 
descending the set of training objects through the decision tree in an order of training object of the ordered list of training objects (paragraphs 0066-0068, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” in at least one layer of the decision tree classifying the training vectors that can be “sorted” (descending the set of training objects through the decision tree in an order of training object of the ordered list of training objects), through “a plurality of training iterations” and that “training error” for “each leaf node” can be computed “at a given iteration”).

Regarding claim 6, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 5 above; and further teach wherein the generating the prediction quality parameter for the given training object having a given position in the ordered list of training objects comprises: 
generating the prediction quality parameter based on targets of only those training objects that  (ii) have been categorized in a same leaf (paragraphs 0066-0068, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” in at least one layer of the decision tree classifying the training vectors that can be “sorted” (wherein the generating the prediction quality parameter for the given training object having a given position in the ordered list of training objects comprises), through “a plurality of training iterations” and that “training error” for “each leaf node” ((ii) have been categorized in a same leaf) can be computed (generating the prediction quality parameter based on targets of only those training objects that) “at a given iteration” of the training vectors ((ii) have been categorized in a same leaf)).
However Milton does not explicitly teach that (i) occur on a position before the given position of the given training object in the ordered list of training objects.
Kimmel teaches that (i) occur on a position before the given position of the given training object in the ordered list of training objects (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples in consecutive ascending order by updating a first index identifying the trajectories of the based on targets of only those training objects that (i) occur on a position before the given position of the given training object in the ordered list of training objects)”, and comparing results to a node’s “threshold” (generating the prediction quality parameter for the given training object having a given position in the ordered list of training objects comprises). In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it ((i) occur on a position before the given position of the given training object in the ordered list of training objects).).
Milton, Kimmel, and Kesin are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claims 7 and 21, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claims 1 and 14 above; and further teach wherein the organizing the set of training objects into an ordered list of training objects comprises: 
generating a plurality of ordered lists of training objects, each of the plurality of ordered lists of training objects being organized such that for each given training object in the ordered list of training objects there is at least one of: 
(i) a preceding training object that occurs before the given training object (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets (lists of training objects), each corresponding to one of a plurality of attributes. The data subsets (lists of training objects) are distributed to a plurality of slave processing units after sorting the data samples (generating a plurality of ordered lists of training objects, each of the plurality of ordered lists of training objects being organized such that for each given training object in the ordered list of training objects) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (there is at least one of: (i) a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.) and 
(ii) a subsequent training object that occurs after the given training object (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets (lists of training objects), each corresponding to one of a plurality of attributes. The data subsets (lists of training objects) are distributed to a plurality of slave processing units after sorting the data samples (generating a plurality of ordered lists of training objects, each of the plurality of ordered lists of training objects being organized such that for each given training object in the ordered list of training objects) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (there is at least one of: (ii) a subsequent training object that occurs after the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order after it); 
a given one of the plurality of ordered lists of training objects having a least partially different order from others of the plurality of ordered lists of training objects (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets (lists of training objects), each corresponding to one of a plurality of attributes. The data subsets (lists of training objects) are distributed to a plurality of slave processing units after sorting the data samples (a given one of the plurality of ordered lists of training objects having a least partially different order from others of the plurality of ordered lists of training objects) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level. In other words, different organized training data sample subsets are distributed to different sources (a given one of the plurality of ordered lists of training objects having a least partially different order from others of the plurality of ordered lists of training objects).).
Milton, Kimmel, and Kesin are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claims 8 and 22, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claims 7 and 21 above; and further teach selecting the given one of the plurality of ordered lists of training objects (Kimmel, paragraphs 0006, 0017, 0020, and 0059 teach using a subset of the training data sample subsets for training a decision tree (selecting the given one of the plurality of ordered lists of training objects)).
Milton, Kimmel, and Kesin are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claims 9 and 23, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claims 8 and 22 above; and further teach wherein the selecting is done for each iteration of generating of the prediction quality parameter (Kimmel, paragraphs 0006, 0017, 0020, and 0059 teach using a subset of the training data sample subsets for training a decision tree for comparing results to a node’s “threshold” (generating of the prediction quality parameter), and that this occurs “During each tree level [training] iteration” (wherein the selecting is done for each iteration of generating of the prediction quality parameter)).
Milton, Kimmel, and Kesin are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claims 10 and 24, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claims 8 and 22 above; and further teach wherein the selecting is done an entirety of a process of verification of prediction quality for a given decision tree (Kimmel, paragraphs 0006, 0017, 0020, and 0059 teach using a subset of the training data sample subsets for training a decision tree for comparing results to a node’s “threshold” (wherein the selecting is done an entirety of a process of verification of prediction quality for a given decision tree), and that this occurs “During each tree level [training] iteration” (wherein the selecting is done an entirety of a process of verification of prediction quality for a given decision tree)).
Milton, Kimmel, and Kesin are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claims 12 and 26, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claims 1 and 14 above; and further teach wherein the set of training objects is not associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in accordance with a rule (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes (set of training objects is not associated with an inherent temporal relationship of training objects). The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects) in consecutive ascending order (in accordance with a rule) by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level”).
Milton, Kimmel, and Kesin are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claim 15, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 14 above; and further teach wherein the method further comprises calculating an indication of the at least one quality approximation parameter of the given training object generated during the previous iteration of the training of the previous decision tree (Milton, paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (calculating an indication of the at least one quality approximation parameter of the given training object)” for “each leaf node (of the given training object)” can be computed “at a given iteration”, before proceeding to other training vectors, iterations, and/or decision trees (generated during the previous iteration of the training of the previous decision tree)).

Regarding claim 16, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 15 above; and further teach wherein the calculating comprises: 
splitting the ordered list of training objects into a plurality of chunks, the plurality of chunks being organized into at least two levels of chunks (Kimmel, splitting the ordered list of training objects into a plurality of chunks), each corresponding to one of a plurality of attributes. The data subsets (by splitting the ordered list of training objects into a plurality of chunks) are distributed to a plurality of slave processing units after sorting the data samples (the plurality of chunks being organized into at least two levels of chunks) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level” and comparing to a node’s “threshold”, In other words, different organized training data sample subsets are distributed to different sources (the plurality of chunks being organized into at least two levels of chunks).).
Milton, Kimmel, and Kesin are combinable for the same rationale as set forth above with respect to claims 1 and 14.

Regarding claim 19, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 16 above; and further teach wherein the calculating the indication of the at least one quality approximation parameter of the given training object generated during the previous iteration of the training of the previous decision tree comprises: 
for the given training object, calculating at least one quality approximation parameter based on the training objects located in the same chunk as the given training object (Milton, paragraphs 0066, 0073-0075, and 0084 teach “navigating the .

Regarding claim 20, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 19 above; and further teach wherein the generating the prediction quality parameter for the given level of a decision tree comprises: 
using quality approximation parameters of past training objects located in a largest chunk that does not contain the given training object (Milton, paragraphs 0066, 0073-0075, and 0084 teach iteratively “navigating the decision tree” with a training set for the tree’s “leaf nodes” in at least one layer of the decision tree for classifying “a vector” (of the given training object) of the training vectors that can be “sorted”, during a training iteration of “a plurality of training iterations”, and computing “training error” (calculating the indication of the at least one quality approximation parameter) for “each leaf node” can be computed “at a given iteration” or total iterations of different training data sets (using quality approximation parameters of past training objects located in a largest chunk that does not contain the given training object), and .

Regarding claim 30, Milton teaches a method of determining a prediction quality parameter for a decision tree in a decision tree prediction model (paragraphs 0020, 0073-0074, 0080, 0086, and 0093 teach a computing system processors with memory, such as “servers”, executing embodiments of the disclosure of “machine learning” techniques including “decision trees” for calculating “leaf node” classification “confidence value[s]” or errors (method of determining a prediction quality parameter) of “decision trees” (for a decision tree in a decision tree prediction model)), 
a given level of the decision tree having at least one node (paragraphs 0073-0074 teach decision tree, well known to include at least one layer, (a given level of the decision tree) including with “leaf nodes” (having at least one node)), 
the prediction quality parameter being for evaluating prediction quality of the decision tree prediction model at a given iteration of training of the decision tree (paragraph 0066, 0073-0074, and 0080 teach determining decision tree leaf node classification “training error (prediction quality parameter being for evaluating prediction quality of the decision tree prediction model) at a given iteration (at a given iteration of training of the decision tree)”), the given iteration of training of the decision tree having at least one previous iteration of training of a previous decision tree, the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf ;
the method being executable at a machine learning system that executes the decision tree prediction model (paragraphs 0020, 0059, 0064, 0073-0074, 0080, and 0086 teach processor with memory for executing embodiments of the disclosure of “machine learning” techniques (method being executable at a machine learning system) including “decision trees” (that executes the decision tree prediction model)), 
the method comprising: 
accessing, from a non-transitory computer-readable medium of the machine learning system, a set of training objects (paragraphs 0010-0011, 0020, 0090, and 0094 teach storing user data in memory (CRM), and paragraphs 0051, 0056-0059, and 0067 teach “training set” data being collected user data (accessing CRM) including “existing records for users” and represented “as a collection of vectors” (a set of training objects)), each training object of the set of training object containing an indication of a document and a target associated with the document (paragraphs 0028, 0052, 0055, 0068, 0074, and 0084 teach outputting “articles” from user training data (and a target associated ; 



descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized, by the decision tree model at the given iteration of training, into a given child node of the at least one node of the given level of the decision tree (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” in at least one layer of the decision tree classifying the training vectors that can be “sorted” (descending the set of training objects through the decision tree so that each one of the set of training objects gets categorized…into a given child node of the at least one node of the given level of the decision tree), through “a plurality of training iterations” and that ; 
generating the prediction quality parameter for the decision tree by: 
generating, for a given training object that has been categorized into the given child node, a prediction quality approximation parameter, the generating being executed based on: 
targets of only those training objects that occur before the given training object in the ordered list of training objects (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector (for a given training object that has been categorized into the given child node)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (generating…a prediction quality approximation parameter)” for “each leaf node (for a given training object that has been categorized into the given child node)” can be computed “at a given iteration”, before proceeding to other training vectors and iterations (the generating being executed based on targets of only those training objects that occur before the given training object in the ordered list of training objects)); and
at least one prediction quality approximation parameter of the given training object generated during the previous iteration of the training of the previous decision tree (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the ;
calculating an indication of the at least one quality approximation parameter of the given training object generated during the at least one previous iteration of the training of the previous decision tree  (paragraphs 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying “a vector” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (calculating an indication of the at least one quality approximation parameter of the given training object)” for “each leaf node (of the given training object)” can be computed “at a given iteration”, before proceeding to other training vectors, iterations, and/or decision trees (generated during the previous iteration of the training of the previous decision tree)).

organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: i) a preceding training object that occurs before the given training object and (ii) a subsequent training object that occurs after the given training object; and ...by splitting the ordered list of training objects into a plurality of chunks, the plurality of chunks being organized into at least two levels of chunks.
Kimmel teaches organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of: 
(i) a preceding training object that occurs before the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.) and 
(ii) a subsequent training object that occurs after the given training object (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a subsequent training object that occurs after the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order after it.), and
…by splitting the ordered list of training objects into a plurality of chunks, the plurality of chunks being organized into at least two levels of chunks (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets (by splitting the ordered list of training objects into a plurality of chunks), each corresponding to one of a plurality of attributes. The data by splitting the ordered list of training objects into a plurality of chunks) are distributed to a plurality of slave processing units after sorting the data samples (the plurality of chunks being organized into at least two levels of chunks) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level.” In other words, different organized training data sample subsets are distributed to different sources (the plurality of chunks being organized into at least two levels of chunks).).
Further, Milton at least implies the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (see mapping above), however Kimmel teaches the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique (paragraphs 0006-0007 teach training a decision tree on sorted training data and node threshold values, and then creating a “decision tree ensemble…by repeating the training process as described” for further decision trees (the decision tree and the previous decision tree forming an ensemble of tree generated using a decision tree boosting technique)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Kimmel’s teachings of “sorting the [training] data samples in consecutive ascending order” for decision tree training and ensemble generation into Milton’s teaching of training a decision tree and computing “each leaf node” training error in order to “achieve best results in decision making” through organizing large training datasets (Kimmel, paragraphs 0004-0006).
each training object of the set of training object containing an indication of a document and a target associated with the document (see mapping above), however Kesin teaches each training object of the set of training object containing an indication of a document and a target associated with the document (Col. 9, line 58-Col. 10, line 4, Col. 11, line 61-Col. 12, line 42, and Fig. 3 teach “supervised…machine learning” training of a “decision tree”, including document “training data” (each training object of the set of training object containing an indication of a document and a target associated with the document)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, to include “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin in order to specifically train a decision tree on “document” data (Kesin, Col. 9, line 58-Col. 10, line 4).

Claims 11, 17, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Kesin (US Patent 9348920), in view of Fano et al (US Pub 20050189415) hereinafter Fano.
Regarding claims 11 and 25, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claims 1 and 14 above, however the combination does not explicitly teach wherein the set of training objects is associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in accordance with the temporal relationship.
Fano teaches wherein the set of training objects is associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in accordance with the temporal relationship (paragraphs 0159-0160 teach training decision tree classifier with a training set in which the examples are in “temporal order” (wherein the set of training objects is associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in accordance with the temporal relationship)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, to include training decision tree classifier with a training set in which the examples are in “temporal order” as taught by Fano in order to increase accuracy of a prediction based on times related to specific training data (Fano, paragraphs 0052 and 0160).

Regarding claim 17, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 16 above. However while Kimmel does teach dividing data used for training into subsets, the combination does not explicitly teach wherein a chunk of a given level of chunks contains a first pre-defined number of training objects and wherein a chunk of a lower level of chunks contains a different pre-defined number of training objects, the different pre-defined number of training objects being greater than the first pre-defined number of training objects.
Fano teaches wherein a chunk of a given level of chunks contains a first pre-defined number of training objects and wherein a chunk of a lower level of chunks contains a different pre-defined number of training objects, the different pre-defined number of training objects being greater than the first pre-defined number of training objects (paragraphs 0159-0160 teaches “the example sets (training objects) were split into a training set, which included the first 80% of examples in temporal order (and wherein a chunk of a lower level of chunks contains a different pre-defined number of training objects, the different pre-defined number of training objects being greater than the first pre-defined number of training objects), and a test set, which included the last 20% (wherein a chunk of a given level of chunks contains a first pre-defined number of training objects)” for a decision tree classifier).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by .

Claims 13 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Kesin (US Patent 9348920), in view of Hong et al (US Pub 20040111169) hereinafter Hong.
Regarding claims 13 and 27, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claims 1 and 14 above, however the combination does not explicitly teach wherein the set of training objects is not associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in a randomly generated order.
Hong teaches wherein the set of training objects is not associated with an inherent temporal relationship of training objects and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the set of training objects in a randomly generated order (paragraphs 0027 and 0061 teaches training models including “decision trees” on a “randomly permuted” order of “training set” with no timestamps (wherein the set of training objects is not associated with an inherent temporal relationship of training objects and wherein .
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, to include training models including “decision trees” on a “randomly permuted” order of “training set” as taught by Hong in order to improve “class probability estimation” and “determine the optimal value” of a prediction model (Hong, paragraphs 0008-0009 and 0061).

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Kesin (US Patent 9348920), in view of Lee et al (US Patent 5978497) hereinafter Lee.
Regarding claim 18, the combination of Milton, Kimmel, and Kesin teach all the claim limitations of claim 16 above. However while Kimmel does teach dividing data used for training into subsets, the combination does not explicitly teach wherein a chunk of a given level of chunks contains a first pre-defined number of training objects and wherein a chunk of a lower level of chunks contains the first pre-defined number of training objects and a second set of training objects located sequentially after the first pre-defined number of training objects in the ordered list, a number of training objects within the second set of training objects being the same as the first pre-defined number of training objects.
Lee teaches wherein a chunk of a given level of chunks contains a first pre-defined number of training objects and wherein a chunk of a lower level of chunks contains the first pre-defined number of training objects and a second set of training objects located sequentially after the first pre-defined number of training objects in the ordered list, a number of training objects within the second set of training objects being the same as the first pre-defined number of training objects (Col.30, lines 40-43 and Col. 34, line 66-Col. 35, line 15 teach a decision tree’s “training data is randomly divided into five equal sets” (wherein a chunk of a given level of chunks contains a first pre-defined number of training objects and wherein a chunk of a lower level of chunks contains the first pre-defined number of training objects and a second set of training objects located sequentially after the first pre-defined number of training objects in the ordered list, a number of training objects within the second set of training objects being the same as the first pre-defined number of training objects)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “supervised…machine learning” training of a “decision tree” on document “training data” as taught by Kesin, to include dividing training data “into five equal sets” as taught by Lee in order to track and improve performance of the classifier (Lee, Col. 34, line 66-Col. 35, line 15).

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  
Gopal Samy et al (US Pub 20130345585) teaches decision tree learning by comparing leaf value with actual value to determine accuracy value for signals.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Primary Examiner, Art Unit 2116