Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is a responsive to the application filed on 06/06/2018.
Claims 1-21 are pending.
Claims 1-21 are rejected.

Information Disclosure Statement
The information disclosure statement (IDS) submitted by Applicant on 06/08/2018 was considered. However, regarding the references Foulds “Learning Instance Weights in Multi-Instance Learning”, JMP, A Business Unit of SAS “Modeling and Multivariate Methods”, and Rupp et al. “Kernel Methods for Virtual Screening”, it is noted that only a cursory consideration was given to said references in view of the extensive length and scope.

Claim Objections
Claims 1, 12, 13, and 17 are objected to because of the following informalities:
Claims 1 and 12 recite a typo stating “each training object of the set of training object contains”, and an optional way to amend this would read “each training object of the set of training objects
Claim 13 recite typos stating “classified in one of child nodes” in line 15 and “separate the range into region” in line 19. Optional ways to amend these would read “classified in one of a plurality of child nodes” and “separate the range into regions”.
Claim 17 contains an equation variable, “Rconstant”, and should be properly defined within the claim.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-13, 18, and 20-21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claims 1, 2, and 12 recite the limitation “at least some of the categorical features values” and the term “some” is a relative term which renders the claim indefinite. Applicant may optionally amend the claim to recite “at least one or more 
The dependent claims 3-11 are also subsequently rejected.

wherein the generating is”, but it is unclear to the examiner if this is referring to the “generating the numeric representation” or “generating the decision tree” of claim 1.

Claims 3 and 4 recite the limitation “the generating is executed for only those prior categorical feature values that have been generated at the at least one prior level of the decision tree”, but it is unclear to the examiner what is specifically meant by the claimed “only” in the sense that the numeric representation is generated only for the prior categorical feature values, when it has already been claimed that the generating for “a current numeric representation” occurs for a “given level of the decision tree” in claim 1. Applicant may overcome this rejection by optionally deleting the term “only” from the claims.

Claim 13 recites the limitation "the split is for causing” in line 14 with insufficient antecedent basis for this limitation in the claim. It is unclear if this is meant to refer to the claimed “split value” in line 11 or an action of splitting the node with insufficient antecedent basis.

Claim 18 recites the limitation "determining which bucket” with insufficient antecedent basis for this limitation in the claim.

Claim 20 recites the limitation "for each decision tree” with insufficient antecedent basis for this limitation in the claim. It is unclear to the Examiner if this is 
Further, claim 20 recites the limitation "a given decision tree”, but it is unclear to the examiner if this is referring to the “decision tree” of claim 13 or a different “decision tree”.

Claim 21 recites the limitation "a feature”, but it is unclear to the examiner if this is meant to refer to one of the “categorical features” of claim 13 or is a different “feature” all together.

The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claim 6 is rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Dependent claim 6 state a limitation of "the method further comprising organizing the set of training objects into the ordered list of training objects", failing to make this limitation narrower than that of claim 2, which state the limitation of "wherein the set of training objects is organized in an ordered list…”.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-8, 10, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Farrell et al (US Patent 5657424) .
Regarding claims 1 and 12, Milton teaches a method, and a server [analogously claimed for executing the method] of converting a value of a categorical feature into a numeric representation thereof, the categorical feature being associated with a training object used for training a Machine Learning Algorithm (MLA), the MLA using a decision tree model having a decision tree (paragraphs 0059, 0066, 0073-0075, and 0084 teach training a “machine learning” decision tree (training a Machine Learning Algorithm (MLA), the MLA using a decision tree model having a decision tree) by “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying (categorical) “a vector (the categorical feature being associated with a training object used for training)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (numeric representation)” or “confidence value (alternative numeric representation)” for “each leaf node (converting a value of a categorical feature into a numeric representation thereof)” can be computed “at a given iteration”, before proceeding to other training vectors and iterations), the training object being processed in a node of a given level of the decision tree (paragraphs 0073-0074 teach a decision tree, well known to include at least one layer, including “leaf nodes” (being processed in a node of a given level of the decision tree) for classifying training vectors (the training object)), , 
the MLA executable by an electronic device to predict a value for an in-use object, the method comprising [claim 1]: 
the server comprising: a non-transitory computer-readable medium; a processor coupled to the non-transitory computer-readable medium, the processor configured to [claim 12]:
accessing, from a non-transitory computer-readable medium of the machine learning system, a set of training objects (paragraphs 0020, 0059, 0064, 0066, 0073-0075, 0080, 0084, and 0086 teach processor communicatively coupled with a memory (executable by an electronic device/CRM) for executing embodiments of the disclosure of “machine learning” techniques (machine learning system) including “decision trees” (the MLA) and computing a “training error value” for “each leaf node” vector classification (to predict a value for an in-use object), and further paragraphs 0010-0011, 0020, 0090, and 0094 teach storing user data in memory (CRM), and paragraphs 0051, 0056-0059, and 0067 teach “training set” data being collected user data (accessing CRM) including “existing records for users” and represented “as a collection of vectors” (a set of training objects)), 
each training object of the set of training object contains a document and an event indicator associated with the document, each document being associated with a categorical feature (paragraphs 0028, 0052, 0055, 0068, 0074, and 0084 teach a tree’s nodes outputting “articles” from user training data including “existing records for users”, such as “purchasing history, media viewing history, automotive records, social networking activity, and the like” (each training object of the set of training object contains a document and an event indicator associated with the document, each document being associated with a categorical feature)); 
generating the numeric representation of the categorical feature value by:

generating, a current numeric representation for the given level of the decision tree (paragraphs 0059, 0066, 0073-0075, and 0084 teach training a “machine learning” decision tree by “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying (categorical) “a vector (categorical feature value)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing , 
the generating being done in-line with generating the decision tree (paragraphs 0059, 0066, 0068-0075, and 0084 teach training a “machine learning” decision tree by “navigating the decision tree” with a training set for the tree’s “leaf nodes” and generating a “training error” or “confidence value” for “each leaf node” and accordingly splitting the decision tree nodes to more accurately classify the training vector (the generating being done in-line with generating the decision tree)).

However Milton does not explicitly teach the decision tree having at least one prior level of the decision tree, the at least one prior level having at least one prior training object having at least one categorical feature value having been converted to a prior numeric representation thereof for the at least one prior level of the decision tree, and generating the numeric representation of the categorical feature value by: retrieving the prior numeric representation of the at least one prior categorical feature value for a given object of the set of training objects at the at least one prior level of the decision tree; generating, for each combination of the at least one prior categorical feature value at the at least one prior level of the decision tree and at least some of the categorical features values of the set of training objects, a current numeric representation for the given level of the decision tree.
Kimmel teaches the decision tree having at least one prior level of the decision tree, the at least one prior level having at least one prior training object having at least one categorical feature value having been converted to a prior numeric representation thereof for the at least one prior level of the decision tree (paragraphs 0004-0006 teaches “During each tree level iteration (decision tree having at least one prior level of the decision tree) a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes (prior training object having at least one categorical feature value). The data subsets are distributed to a plurality of slave processing units after sorting the data samples in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (prior level having at least one prior training object having at least one categorical feature value having been converted to a prior numeric representation thereof for the at least one prior level of the decision tree)”. Further, “Split functions information of each tree node comprises a pair of data attribute and a threshold value that together provide best reduction in impurity for a respective tree node” (prior training object having at least one categorical feature value having been converted to a prior numeric representation thereof).).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Kimmel’s teachings of training every 
However Kimmel does not explicitly teach generating the numeric representation of the categorical feature value by: retrieving the prior numeric representation of the at least one prior categorical feature value for a given object of the set of training objects at the at least one prior level of the decision tree; generating, for each combination of the at least one prior categorical feature value at the at least one prior level of the decision tree and at least some of the categorical features values of the set of training objects, a current numeric representation for the given level of the decision tree.
Farrell teaches generating the numeric representation of the categorical feature value by:
retrieving the prior numeric representation of the at least one prior categorical feature value for a given object of the set of training objects at the at least one prior level of the decision tree (Col. 7, line 27-Col. 8, line 33, Col. 9, lines 31-67, Col. 10, line 60-Col. 11, line 22 teach associating labeled training vectors with “non-terminal” or “root” nodes of a “decision tree” (prior level of the decision tree) wherein these nodes (prior level of the decision tree) classify each of the training vectors as a word (at least one prior categorical feature value for a given object of the set of training objects) and then convert the result to a numeric value, where a match being “1” (prior numeric representation) or non-; 
generating, for each combination of the at least one prior categorical feature value at the at least one prior level of the decision tree and at least some of the categorical features values of the set of training objects, a current numeric representation for the given level of the decision tree (Col. 1, lines 48-65, Col. 7, line 27-Col. 8, line 33, Col. 9, lines 31-67, Col. 10, line 60-Col. 11, line 22, and Fig. 5A-5B teach the “root node” classifications (prior categorical feature value at the at least one prior level of the decision tree) and corresponding numeric classifications of supervised training vectors (as mapped above) are passed on to each “terminal” or “child” node to “recursively” perform the “procedure” of the node classifying the supervised training vectors (categorical features values of the set of training objects) if the root node deems the vector appropriate to pass on (each combination) and convert the result to a corresponding numeric value (generating…a current numeric representation for the given level of the decision tree)).
Milton at least implies the generating being done in-line with generating the decision tree, however Farrell teaches the generating being done in-line with generating the decision tree (Col. 1, lines 48-65, Col. 7, line 27-Col. 8, line 33, Col. 9, lines 31-67, Col. 10, line 60-Col. 11, line 22, and Fig. 5A-5B teach training a “decision tree classifier algorithm” as mapped above (the generating) while splitting nodes and adding levels while limiting the “tree structure” to not “grow beyond a predetermined level” (being done in-line with generating the decision tree)).

Additionally, Milton at least implies each training object of the set of training object contains a document and an event indicator associated with the document, each document being associated with a categorical feature, however Venkataraman teaches each training object of the set of training object contains a document and an event indicator associated with the document, each document being associated with a categorical feature (paragraphs 0033-0036, 0043-0044, 0048, 0052-0054, and Figs. 3 and 13 teach building a training set of documents (each training object of the set of training object contains a document) with associated user click information (and an event indicator associated with the document), job description field (each document being associated with a categorical feature), and related queries (alternative each document being associated with a categorical feature) for training a decision tree).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by 

Regarding claim 2, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach wherein the set of training objects is organized in an ordered list such that: 
for each given training object in the ordered list of training objects there is at least one of:
(i) a preceding training object that occurs before the given training object (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.) and 
(ii) a subsequent training object that occurs after the given training object (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a subsequent training object that occurs after the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order after it.), and wherein 
the at least some of the categorical features values are those categorical features values associated with training objects that appear earlier in the ordered list of training objects (Kimmel, the at least some of the categorical features values are those categorical features values associated with training objects that appear earlier in the ordered list of training objects). The data subsets are distributed to a plurality of slave processing units after sorting the data samples (ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Kimmel’s teachings of “sorting the [training] data samples in consecutive ascending order” for decision tree training and ensemble generation into Milton’s teaching of training a decision tree and computing “each leaf node” training error in order to “achieve best results in decision making” through organizing large training datasets (Kimmel, paragraphs 0004-0006).

Regarding claim 3, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach wherein the generating is executed for only those prior categorical feature values that have been generated at the at least one prior level of the decision tree (Farrell, Col. 1, lines 48-65, Col. 7, line 27-Col. 8, line 33, Col. 9, lines 31-67, Col. 10, line 60-Col. 11, line 22, and Fig. 5A-5B teach the “root node” classifications (only those prior categorical feature values that have been generated at the at least one prior level of the decision tree) and corresponding numeric classifications of supervised training vectors (as mapped above) are passed on to each “terminal” or “child” node to “recursively” perform the “procedure” of the node classifying the supervised training vectors if the root node deems the vector appropriate to pass on and convert the result to a corresponding numeric value (the generating is executed)).
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 4, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach wherein the generating is executed for only those prior categorical feature values that have been generated at the at least one prior level of the decision tree and at least one previous iteration of the decision tree (Farrell, Col. 1, lines 48-65, Col. 7, line 27-Col. 8, line 33, Col. 9, lines 31-67, Col. 10, line 60-Col. 11, line 22, and Fig. 5A-5B teach the previous “root node” classifications (only those prior categorical feature values that have been generated at the at least one prior level of the decision tree and .
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 5, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach wherein the event indicator has a pre-determined value, the pre-determined value being one of a positive outcome and a negative outcome (Venkataraman, paragraphs 0020, 0033-0036, 0043-0044, 0048, 0052-0054, and Figs. 3 and 13 teach building a training set of documents with associated user click information (event indicator) being “positive user actions” when clicking the document (event indicator has a pre-determined value, the pre-determined value being one of a positive outcome and a negative outcome) for training a decision tree).
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 6, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 2 above; and further teach the method further comprising organizing the set of training objects into the ordered list of training objects (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into the ordered list of training objects) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (organizing the set of training objects into the ordered list of training objects In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order.).
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claims 1 and 2.

Regarding claim 7, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 6 above; and further teach wherein the organizing the training objects into the ordered list of training objects is executed at a point in time prior to the generating of the numeric value (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into the ordered list of training objects) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (organizing the set of training objects into the ordered list of training objects). Further, “Split functions information of each tree node comprises a pair of data attribute and a threshold value that together provide best reduction in impurity for a respective tree node” (wherein the organizing the training objects into the ordered list of training objects is executed at a point in time prior to the generating of the numeric value). In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order and then split function values are determined for each node corresponding to the training data (wherein the organizing the training objects into the ordered list of training objects is executed at a point in time prior to the generating of the numeric value).).
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claims 1 and 2.

Regarding claim 8, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 6 above; and further teach wherein the organizing the set of training objects into the ordered list of training objects comprises organizing a plurality of sets of ordered lists and wherein the method further comprises, prior to the generating of the numeric value selecting a given one of the plurality of sets of ordered lists (Kimmel, paragraphs 0004-0006, 0017, 0020, and 0059 teach using a subset of the training data sample subsets for training a decision tree (selecting a given one of the plurality of sets of ordered lists), organizing the set of training objects into the ordered list of training objects comprises organizing a plurality of sets of ordered lists and wherein the method further comprises) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (organizing the set of training objects into the ordered list of training objects comprises organizing a plurality of sets of ordered lists and wherein the method further comprises). Further, “Split functions information of each tree node comprises a pair of data attribute and a threshold value that together provide best reduction in impurity for a respective tree node” (wherein the method further comprises, prior to the generating of the numeric value selecting…). In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order, selected, and then split function values are determined for each node corresponding to the training data (wherein the method further comprises, prior to the generating of the numeric value selecting…).).
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claims 1 and 2.

Regarding claim 10, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 6 above; and further teach wherein the training objects are not associated with an inherent temporal order, and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the training objects in accordance with a pre-determined rule (Kimmel, paragraphs 0004-0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes (set of training objects are not associated with an inherent temporal order). The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into the ordered list of training objects comprises organizing the training objects) in consecutive ascending order (in accordance with a pre-determined rule) by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level”).
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claims 1 and 2.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Farrell et al (US Patent 5657424) hereinafter Farrell, in view of Venkataraman et al (US Pub 20180232375) hereinafter Venkataraman, in view of Fano et al (US Pub 20050189415) hereinafter Fano.
Regarding claim 9, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 6 above; however the combination wherein the training objects are associated with an inherent temporal order, and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the training objects in accordance with the temporal order.
Fano teaches wherein the training objects are associated with an inherent temporal order, and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the training objects in accordance with the temporal order (paragraphs 0159-0160 teach training decision tree classifier with a training set in which the examples are in “temporal order” (wherein the set of training objects are associated with an inherent temporal order, and wherein the organizing the set of training objects into the ordered list of training objects comprises organizing the training objects in accordance with the temporal order)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “root node” classification influence on dependent nodes during training of decision tree algorithm building as taught by Farrell, as modified by decision tree training set of documents with associated user click information, job description field, and search query as taught by Venkataraman, to include training decision tree classifier with a training set in which the examples are in “temporal order” as taught by Fano in order to increase accuracy of a prediction based on times related to specific training data (Fano, paragraphs 0052 and 0160).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Farrell et al (US Patent 5657424) hereinafter Farrell, in view of Venkataraman et al (US Pub 20180232375) hereinafter Venkataraman, in view of Hong et al (US Pub 20040111169) hereinafter Hong.
Regarding claim 11, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 6 above; however the combination does not explicitly teach wherein the training objects are not associated with an inherent temporal order, and wherein the organizing the set of training objects into the ordered list of training objects comprises generating a random order of the training objects to be used as the ordered list.
Hong teaches wherein the training objects are not associated with an inherent temporal order, and wherein the organizing the set of training objects into the ordered list of training objects comprises generating a random order of the training objects to be used as the ordered list (paragraphs 0027 and 0061 teaches training models including “decision trees” on a “randomly permuted” order of “training set” with no timestamps (wherein the set of training objects is not associated with an inherent temporal order, and wherein the organizing the set of training objects into the ordered list of training objects comprises generating a random order of the training objects to be used as the ordered list)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing .


Claims 13-21 are rejected under 35 U.S.C. 103 as being unpatentable over Farrell et al (US Patent 5657424) hereinafter Farrell, in view of McFall et al (US Pub 20200327252) hereinafter McFall.
Regarding claim 13, Farrell teaches a method of generating a split value for a node in a decision tree of a decision tree model used by a Machine Learning Algorithm (MLA), the split value being for a node at a particular level of the decision tree (Col. 8, line 34-Col. 9, line 29 and equation 1 teach recursively splitting nodes in a decision tree classifier algorithm (of a decision tree model used by a Machine Learning Algorithm (MLA)) according to a calculated dividing plane value (generating a split value for a node in a decision tree) for an area of plotted vectors for nodes at different levels of the tree (the split value being for a node at a particular level of the decision tree)), the node for classifying an object having a categorical feature value that is to be translated into a numeric value representative thereof (Col. 7, line 27-Col. 8, line 33, Col. 8, line 34-Col. 9, line 67, Col. 10, line 60-Col. 11, line 22 teach at each node in specific levels of a decision tree, classifying supervised training vectors (the node for classifying an object having a categorical feature value) and convert the result (that is to be translated into a) to a numeric value, where a match being “1” (numeric value representative thereof) or non-match being “0” (numeric value representative thereof)), the split is for causing the object to be classified in one of child nodes of the node based on the numeric value and the split value, the MLA executable by an electronic device to predict a value for an in-use object (Col. 4, lines 3-48, Col. 7, line 27-Col. 8, line 33, Col. 8, line 34-Col. 9, line 67, Col. 10, line 60-Col. 11, line 22, equation1, and Fig. 4 teach a processor training a decision tree classifier algorithm that includes recursively splitting nodes in a decision tree according to a calculated dividing plane value (the split is for causing) for an area of plotted vectors for nodes at different levels of the tree, and the splitting creating “child nodes” of a “root” or “non-terminal” node that properly classify the supervised training vectors (the object to be classified in one of child nodes of the node… the MLA executable by an electronic device to predict a value for an in-use object), thus the “root” or “non-terminal” node classifications and corresponding numeric match classifications determining which split child node to send the training vector (based on the numeric value and the split value)), the method comprising:
generating a range of all possible values of the categorical features (Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach nodes are split according to a process, wherein “the oval area T represents a region in which all ; 
applying a grid to the range to separate the range into region, each region having a boundary (Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4  teach defining “a determinant plane” to a plotted area (grid) of plotted vector regions of all values (applying a grid to the range) to divide the regions into associated node areas (to separate the range into region) that are depicted as having a limited distance from the centroid within the plotted area and the plane (each region having a boundary)); 
using the boundary as the split value (Col. 8, line 34-Col. 9, line 29, equation 1, and Fig 4 teach the divided regions associated with node areas that are depicted as having a limited distance from the centroid (boundary) and from the dividing plane (boundary) in order to “split” nodes with corresponding vector value classification regions (using the boundary as the split value)); 
the generating and the applying being executed before the categorical feature value is translated into the numeric representation thereof (Col. 8, line 34-Col. 9, line 29, equation 1, and Fig 4 teach splitting nodes and forming a decision tree in order to then (the generating and the applying being executed before) properly classify the training vectors (categorical feature .

Farrell at least implies a grid (see mapping above), however McFall teaches a grid (paragraphs 0482-0491 teach splitting a decision tree node according to a node’s corresponding data being divided into ranges with boundaries of “approximately equal population” (a grid utilization)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement McFall’s teachings of splitting decision tree nodes according to a data range division and boundaries into Farrell’s teaching of parent node classification influence on dependent nodes and node splitting operations during training of decision tree algorithm building in order to increase classification accuracy of a decision model through developing specific classification nodes (McFall, paragraphs 0482-0491).

Regarding claim 14, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the grid has a pre-determined format (Farrell, Col. 8, line 34-Col. 9, line 29  and equation 1 teach the applied plane to a plotted area (grid) being greater than “0” (has a pre-determined format)).

Regarding claim 15, the combination of Farrell and McFall teach all the claim limitations of claim 14 above; and further teach wherein the grid is one of a regular interval grid and an irregular interval grid (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach a plane dividing vectors plotted in an area, thus creating two different sections that can be dimensionally unequal (wherein the grid is…an irregular interval grid)).

Regarding claim 16, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the range is between zero and one (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach nodes are split according to a plotted area and divided “region[s]”, wherein the regions represent a match or not match for the supervised training vectors and converted to representative numeric range of match “1” or not match “0” (range is between zero and one)).

Regarding claim 17, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the numeric representations of the categorical feature values are calculated using an Rconstant and wherein the range is between Rconstant and 1+(Rconstant) (Farrell, Col. 4, lines 3-48, Col. 7, line 27-Col. 8, line 33, Col. 8, line 34-Col. 9, line 67, Col. 10, line 60-Col. 11, line 22, equation1, and Fig. 4 teach at each node in specific levels of a decision tree, classifying supervised training vectors (categorical feature values) and convert the result (calculated using) to a numeric value (wherein the numeric representations of the categorical feature values are calculated), wherein the conversion includes a resultant not match being “0” (wherein the range is between Rconstant) or match being “1” (and .

Regarding claim 18, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the method further comprises, during an in-use phase, for a given counter value representing a categorical feature, determining which bucket defined by the grid the given counter value falls into and using the associated boundaries as values for splits (Farrell, Col. 4, lines 3-48, Col. 7, line 27-Col. 8, line 33, Col. 8, line 34-Col. 9, line 67, Col. 10, line 60-Col. 11, line 22, equation1, and Fig. 4 teach while training and also executing the decision tree for classifying vectors (during an in-use phase), plotting the vectors in an area with a calculated dividing plane (and using the associated boundaries as values for splits) and determining “the total number of vectors of both classes (for a given counter value representing a categorical feature, determining which bucket defined by the grid the given counter value falls into) assigned to the terminal node”).

Regarding claim 19, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the using the boundary as the split value is executed for each level of the decision tree and wherein the method further comprises, once a given level of the decision tree is trained, re-calculating the split value (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach the divided regions associated with node areas that are depicted as having a limited distance from the centroid (boundary) and from the dividing plane (boundary) in .

Regarding claim 20, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the using the boundary as the split value is executed for each decision tree and wherein the method further comprises, once a given decision tree is trained, re-calculating the split value (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach the divided regions associated with node areas that are depicted as having a limited distance from the centroid (boundary) and from the dividing plane (boundary) in order to “split” nodes with corresponding vector value classification regions (using the boundary as the split value). The splitting process is taught to be performed “recursively” for nodes in each “level” of a decision tree (is executed for each level of the decision tree), wherein the boundaries are replotted at each training iteration of each level of a decision tree, and further for training each tree of the “decision trees fir each of the target words” (wherein the method further comprises, once a given decision tree is trained, re-calculating the split value)).

Regarding claim 21, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the using the boundary as the split value is executed during training of the MLA and wherein the training of the MLA, during a given iteration of one of: (i) a given level of the decision tree and (ii) a given iteration of the decision tree (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach the divided regions associated with node areas that are depicted as having a limited distance from the centroid (boundary) and from the dividing plane (boundary) in order to “split” nodes with corresponding vector value classification regions (using the boundary as the split value). The splitting process is taught to be performed “recursively” for nodes in each “level” of a decision tree (a given level of the decision tree), wherein the boundaries are replotted at each training iteration of each level of a decision tree classifier algorithm (is executed during training of the MLA and wherein the training of the MLA, during a given iteration of one of: (i) a given level of the decision tree and (ii) a given iteration of the decision tree)), comprises:
selecting a best value of a feature to be placed at the given iteration and a best value of the split associated therewith (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach the divided regions associated with node areas that are depicted as having a limited distance from the centroid and from the dividing plane in order to “split” nodes with corresponding vector value classification regions (selecting…a best value of the split associated therewith). The splitting process is taught to be performed “recursively” (iteratively) for nodes in each “level” of a decision tree, wherein the boundaries are replotted at each training iteration of each level of a decision tree classifier algorithm to properly .

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  
Nettleton et al (“Analysis of Web Search Engine Clicked Documents”, 2006) teaches training a decision trees on document categories and associated user click status.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for 




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123