Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the amendments and remarks filed on 12/01/2021.
Claims 1-5 and 7-21 are pending.
Claims 1-5, 7-13, 17-18, and 20-21 have been amended.
Claim 6 has been canceled.

Response to Arguments
Applicant’s arguments, with respect to the claim objections, have been fully considered and are persuasive. Therefore, the objections set forth in the previous office action have been withdrawn. However, upon further consideration, a new ground(s) of objection have been made.

Applicant’s arguments, with respect to select rejections of claim(s) 1-13, 18, and 20-21 under 35 U.S.C. 112(b), have been fully considered and are persuasive. Therefore, the rejections have been withdrawn. 

Applicant’s arguments, with respect to select rejections of claim(s) 6 under 35 U.S.C. 112(d), have been fully considered and are persuasive. Therefore, the rejections have been withdrawn. 

Applicant’s second argument, with respect to the rejection(s) of claim(s) 1 and 12 under 35 U.S.C. 103, has been considered but is not persuasive. More specifically, applicant argues that no cited reference teaches the amended claim limitations for claims 1 and 12, since no reference teaches “generating a current numeric representation based on ‘preceding training objects in the ordered list,’” as claimed. The examiner respectfully disagrees.
Due to the broadness of the claim language, Milton has been found to teach the claim limitations in paragraphs 0055, 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying (categorical) each “vector” of the training vectors in the training set as mapped above (training objects in the ordered list with the at least one of the categorical feature values in the ordered list) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (numeric representation)” for “each leaf node (current numeric representation)” can be computed “at a given iteration” when (based on) “audience membership status is known” for the training set examples (with the at least one of the categorical feature values in the ordered list), before proceeding to other training vectors and iterations (a number of total occurrences of preceding training objects in the ordered list).
Further, see 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.

Applicant’s second argument, with respect to the rejection(s) of claim(s) 13 under 35 U.S.C. 103, has been considered but is not persuasive. More specifically, applicant argues that no cited reference teaches the amended claim limitations for claim 13, since Farrell does not teach “the generating and the applying being executed before the categorical feature value is translated into the numeric representation thereof” as claimed; “[r]ather, in Farrell, the feature vectors are formed first, and then the decision tree is trained”. The examiner respectfully disagrees.
Primarily, it is noted while Farrell’s training vectors can be formed before training (as argued) “[a]ccording to one application for the invention [col. 6, line 29]”, the classifications of the vectors are not yet assigned, nor are the vectors converted to “a numeric value… in addition to a confidence value of the classification [as previous mapped]” (generating and applying before “value is translated” as claimed). 
Therefore, due to the broadness of the claim language, Farrell is maintained as having been found to teach the language required by the claim limitations; in Col. 8, line 34-Col. 9, line 29, equation 1, and Fig 4 teach splitting nodes and forming a decision tree in order to then (the generating and the applying being executed before) properly classify the training vectors (categorical feature value) and convert the result to a numeric value (representation) in addition to a confidence value of the classification (alternative numeric representation).
Further, see 35 U.S.C 103 section for full mapping of claim limitations.

Claim Objections
Claim 20 is objected to because of the following informalities:
Claims 20 recite a typo stating “re-calculating the each split value”, and an optional way to amend this would read “re-calculating each of the s”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-5, 7-8, 10, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Farrell et al (US .
Regarding claims 1 and 12, Milton teaches a method, and a server [analogously claimed for executing the method] of converting a value of a categorical feature into a numeric representation thereof, the categorical feature being associated with a training object used for training a Machine Learning Algorithm (MLA), the MLA using a decision tree model having a decision tree (paragraphs 0059, 0066, 0073-0075, and 0084 teach training a “machine learning” decision tree (training a Machine Learning Algorithm (MLA), the MLA using a decision tree model having a decision tree) by “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying (categorical) “a vector (the categorical feature being associated with a training object used for training)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (numeric representation)” or “confidence value (alternative numeric representation)” for “each leaf node (converting a value of a categorical feature into a numeric representation thereof)” can be computed “at a given iteration”, before proceeding to other training vectors and iterations), the training object being processed in a node of a given level of the decision tree (paragraphs 0073-0074 teach a decision tree, well known to include at least one layer, including “leaf nodes” (being processed in a node of a given level of the decision tree) for classifying training vectors (the training object)), , 
the MLA executable by an electronic device to predict a value for an in-use object, the method comprising [claim 1]: 
the server comprising: a non-transitory computer-readable medium; a processor coupled to the non-transitory computer-readable medium, the processor configured to [claim 12]:
accessing, from a non-transitory computer-readable medium of the machine learning system, a set of training objects (paragraphs 0020, 0059, 0064, 0066, 0073-0075, 0080, 0084, and 0086 teach processor communicatively coupled with a memory (executable by an electronic device/CRM) for executing embodiments of the disclosure of “machine learning” techniques (machine learning system) including “decision trees” (the MLA) and computing a “training error value” for “each leaf node” vector classification (to predict a value for an in-use object), and further paragraphs 0010-0011, 0020, 0090, and 0094 teach storing user data in memory (CRM), and paragraphs 0051, 0056-0059, and 0067 teach “training set” data being collected user data (accessing CRM) including “existing records for users” and represented “as a collection of vectors” (a set of training objects)), wherein the set of training objects is organized in an ordered list of training objects (Examiner note: Applicant’s spec, paragraph 0105 states “a random order of the training 
Milton, paragraphs 0051, 0056-0059, and 0067 teach “training set” data (ordered list) being collected user data including “existing records for users” and represented “as a collection of vectors” (a set of training objects)), wherein each training object of the set of training object contains a document and an event indicator associated with the document, and wherein each document is associated with a categorical feature (paragraphs 0028, 0052, 0055, 0068, 0074, and 0084 teach a tree’s nodes outputting “articles” from user training data including “existing records for users”, such as “purchasing history, media viewing history, automotive records, social networking activity, and the like” (each training object of the set of training object contains a document and an event indicator associated with the document, and wherein each document is associated with a categorical feature)); 
generating the numeric representation of the categorical feature value by:

generating, a current numeric representation for the given level of the decision tree (paragraphs 0059, 0066, 0073-0075, and 0084 teach training a “machine learning” decision tree by “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying (categorical) “a vector (categorical feature value)” of the training vectors in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (numeric representation)” or “confidence value (alternative numeric representation)” for “each leaf node (generating…a current numeric representation for the given level of the decision tree)” can be computed “at a given iteration”, before proceeding to other training vectors and iterations), wherein the current numeric representation is generated based on: 
(i) a number of total occurrences of preceding training objects in the ordered list with the at least one of the categorical feature values in the ordered list (paragraphs 0055, 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying (categorical) each “vector” of the training vectors in the training set as mapped above (training objects in the ordered list with the at least one of the categorical feature values in the ordered list) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error (numeric representation)” ; and 
(ii) a number of pre-determined outcomes of events associated with the preceding training objects with the at least one of the categorical feature values in the ordered list (paragraphs 0055, 0066, 0073-0075, and 0084 teach “navigating the decision tree” with a training set for the tree’s “leaf nodes” for classifying (categorical) each “vector” of the training vectors in the training set as mapped above (preceding training objects with the at least one of the categorical feature values in the ordered list) in “a leaf node”, during a training iteration of “a plurality of training iterations”, and computing “training error” for “each leaf node” can be computed “at a given iteration” when (based on) “audience membership status is known” for the training set examples (a number of pre-determined outcomes of events associated with the preceding training objects with the at least one of the categorical feature values in the ordered list), before proceeding to other ,
the generating being done in-line with generating the decision tree (paragraphs 0059, 0066, 0068-0075, and 0084 teach training a “machine learning” decision tree by “navigating the decision tree” with a training set for the tree’s “leaf nodes” and generating a “training error” or “confidence value” for “each leaf node” and accordingly splitting the decision tree nodes to more accurately classify the training vector (the generating being done in-line with generating the decision tree)).

However Milton does not explicitly teach wherein the set of training objects is organized in an ordered list of training objects, the decision tree having at least one prior level of the decision tree, the at least one prior level having at least one prior training object having at least one prior categorical feature value having been converted to a prior numeric representation thereof for the at least one prior level of the decision tree, and generating the numeric representation of the categorical feature value by: retrieving the prior numeric representation of the at least one prior categorical feature value for a given object of the set of training objects at the at least one prior level of the decision tree; generating, for each combination of the at least one prior categorical feature value at the at least one prior level of the decision tree and at least one of the categorical features values of the set of training objects, a current numeric representation for the given level of the decision tree.
Kimmel teaches the decision tree having at least one prior level of the decision tree, the at least one prior level having at least one prior training object having at least one prior categorical feature value having been converted to a prior numeric representation thereof for the at least one prior level of the decision tree (paragraphs 0004-0006 teaches “During each tree level iteration (decision tree having at least one prior level of the decision tree) a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes (prior training object having at least one prior categorical feature value). The data subsets are distributed to a plurality of slave processing units after sorting the data samples in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (prior level having at least one prior training object having at least one prior categorical feature value having been converted to a prior numeric representation thereof for the at least one prior level of the decision tree)”. Further, “Split functions information of each tree node comprises a pair of data attribute and a threshold value that together provide best reduction in impurity for a respective tree node” (prior training object having at least one prior categorical feature value having been converted to a prior numeric representation thereof).).
Further Milton at least implies wherein the set of training objects is organized in an ordered list of training objects (see mapping above), however Kimmel teaches wherein the set of training objects is organized in an ordered list of training objects (paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (the set of training objects is organized in an ordered list of training objects) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (the set of training objects is organized in an ordered list of training objects In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order.).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Kimmel’s teachings of training every level of a decision tree and “sorting the [training] data samples” into Milton’s teaching of training a decision tree and computing “each leaf node” training error in order to “achieve best results in decision making” through organizing large training datasets and training each level accordingly (Kimmel, paragraphs 0004-0006).
However Kimmel does not explicitly teach generating the numeric representation of the categorical feature value by: retrieving the prior numeric representation of the at least one prior categorical feature value for a given object of the set of training objects at the at least one prior level of the decision tree; generating, for each combination of the at least one prior categorical feature value at the at least one prior level of the decision tree and at least one of the categorical features values of the set of training objects, a current numeric representation for the given level of the decision tree.
Farrell teaches generating the numeric representation of the categorical feature value by:
retrieving the prior numeric representation of the at least one prior categorical feature value for a given object of the set of training objects at the at least one prior level of the decision tree (Col. 7, line 27-Col. 8, line 33, Col. 9, lines 31-67, Col. 10, line 60-Col. 11, line 22 teach associating labeled training vectors with “non-terminal” or “root” nodes of a “decision tree” (prior level of the decision tree) wherein these nodes (prior level of the decision tree) classify each of the training vectors as a word (at least one prior categorical feature value for a given object of the set of training objects) and then convert the result to a numeric value, where a match being “1” (prior numeric representation) or non-match being “0” (prior numeric representation), and passing the results forward for further classification in the “terminal node[s]” (retrieving)); 
generating, for each combination of the at least one prior categorical feature value at the at least one prior level of the decision tree and at least one of the categorical features values of the set of training objects, a current numeric representation for the given level of the decision tree (Col. 1, lines 48-65, Col. 7, line 27-Col. 8, line 33, Col. 9, lines 31-67, Col. 10, line 60-Col. 11, line 22, and Fig. 5A-5B teach the “root node” classifications (prior categorical feature value at the at least one prior level of the decision tree) and corresponding numeric classifications of supervised training vectors (as mapped 
Milton at least implies the generating being done in-line with generating the decision tree, however Farrell teaches the generating being done in-line with generating the decision tree (Col. 1, lines 48-65, Col. 7, line 27-Col. 8, line 33, Col. 9, lines 31-67, Col. 10, line 60-Col. 11, line 22, and Fig. 5A-5B teach training a “decision tree classifier algorithm” as mapped above (the generating) while splitting nodes and adding levels while limiting the “tree structure” to not “grow beyond a predetermined level” (being done in-line with generating the decision tree)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by training every level of a decision tree and “sorting the [training] data samples” as taught by Kimmel, to include “root node” classification influence on dependent nodes during training of decision tree algorithm building as taught by Farrell in order to increase granular accuracy of the overall tree when classifying training vectors (Farrell, Col. 7, line 27-Col. 8, line 33).
Additionally, Milton at least implies each training object of the set of training object contains a document and an event indicator associated with the document, each document being associated with a categorical feature, however each training object of the set of training object contains a document and an event indicator associated with the document, each document being associated with a categorical feature (paragraphs 0033-0036, 0043-0044, 0048, 0052-0054, and Figs. 3 and 13 teach building a training set of documents (each training object of the set of training object contains a document) with associated user click information (and an event indicator associated with the document), job description field (each document being associated with a categorical feature), and related queries (alternative each document being associated with a categorical feature) for training a decision tree).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “root node” classification influence on dependent nodes during training of decision tree algorithm building as taught by Farrell, to include a decision tree training set of documents with associated user click information, job description field, and search query as taught by Venkataraman in order to create a personalized trained decision tree for returning a ranked list of documents relevant to a specific user (Venkataraman, paragraphs 0033-0036, 0043-0044, 0048, 0052-0054, and Figs. 3 and 1).

Regarding claim 2, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach wherein, 
for each given training object in the ordered list of training objects there is at least one of:
(i) a preceding training object that occurs before the given training object (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.) and 
(ii) a subsequent training object that occurs after the given training object (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed organizing the set of training objects into an ordered list of training objects, the ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a subsequent training object that occurs after the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order after it.), and wherein 
the at least one of the categorical feature values are those categorical features values associated with training objects that appear earlier in the ordered list of training objects (Kimmel, paragraphs 0004-0006 teach “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes (the at least one of the categorical features values are those categorical features values associated with training objects that appear earlier in the ordered list of training objects). The data subsets are distributed to a plurality of slave processing units after sorting the data samples (ordered list of training objects is organized such that for each given training object in the ordered list of training objects there is at least one of) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (a preceding training object that occurs before the given training object)”. In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order before it.).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Kimmel’s teachings of “sorting the [training] data samples in consecutive ascending order” for decision tree training and ensemble generation into Milton’s teaching of training a decision tree and computing “each leaf node” training error in order to “achieve best results in decision making” through organizing large training datasets (Kimmel, paragraphs 0004-0006).

Regarding claim 3, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach wherein the generating the current numeric representation is executed for those prior categorical feature values that have been generated at the at least one prior level of the decision tree (Farrell, Col. 1, lines 48-65, Col. 7, line 27-Col. 8, line 33, Col. 9, lines 31-67, Col. 10, line 60-Col. 11, line 22, and Fig. 5A-5B teach the “root node” classifications (those prior categorical feature values that have been generated at the at least one prior level of the decision tree) and corresponding numeric classifications of supervised training vectors (as mapped above) are passed on to each .
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 4, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach wherein the generating the current numeric representation is executed for those prior categorical feature values that have been generated at the at least one prior level of the decision tree and at least one previous iteration of the decision tree (Farrell, Col. 1, lines 48-65, Col. 7, line 27-Col. 8, line 33, Col. 9, lines 31-67, Col. 10, line 60-Col. 11, line 22, and Fig. 5A-5B teach the previous “root node” classifications (those prior categorical feature values that have been generated at the at least one prior level of the decision tree and at least one previous iteration of the decision tree) and corresponding numeric classifications of supervised training vectors (as mapped above) are passed on to each “terminal” or “child” node to “recursively” perform (and at least one previous iteration of the decision tree) the “procedure” of the node classifying the supervised training vectors if the root node deems the vector appropriate to pass on and convert the result to a corresponding numeric value (the generating the current numeric representation is executed)).


Regarding claim 5, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach wherein the event indicator has a pre-determined value, the pre-determined value being one of pre-determined outcomes of events are a positive outcome or a negative outcome (Venkataraman, paragraphs 0020, 0033-0036, 0043-0044, 0048, 0052-0054, and Figs. 3 and 13 teach building a training set of documents with associated user click information (event indicator) being “positive user actions” when clicking the document (event indicator has a pre-determined value, the pre-determined value being one of pre-determined outcomes of events are a positive outcome or a negative outcome) for training a decision tree).
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 7, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach wherein the set of training objects are organized into the ordered list of training objects at a point in time prior to the generating of the numeric representation (Kimmel, paragraph 0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of the set of training objects are organized into the ordered list of training objects) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (the set of training objects are organized into the ordered list of training objects). Further, “Split functions information of each tree node comprises a pair of data attribute and a threshold value that together provide best reduction in impurity for a respective tree node” (wherein the set of training objects are organized into the ordered list of training objects at a point in time prior to the generating of the numeric representation). In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order and then split function values are determined for each node corresponding to the training data (wherein the set of training objects are organized into the ordered list of training objects at a point in time prior to the generating of the numeric representation).).
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claims 1 and 2.

Regarding claim 8, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach further comprising organizing a plurality of sets of ordered lists of training objects and wherein the method further comprises, prior to the generating of the numeric representation selecting a given one of the plurality of sets of ordered lists (Kimmel, paragraphs 0004-0006, 0017, 0020, and 0059 teach using a subset of the selecting a given one of the plurality of sets of ordered lists), wherein “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes. The data subsets are distributed to a plurality of slave processing units after sorting the data samples (organizing a plurality of sets of ordered lists of training objects and wherein the method further comprises) in consecutive ascending order by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level (organizing a plurality of sets of ordered lists of training objects and wherein the method further comprises). Further, “Split functions information of each tree node comprises a pair of data attribute and a threshold value that together provide best reduction in impurity for a respective tree node” (wherein the method further comprises, prior to the generating of the numeric representation selecting…). In other words, given a sample in the training dataset, other training samples are taught to be sorted in a consecutive ascending order, selected, and then split function values are determined for each node corresponding to the training data (wherein the method further comprises, prior to the generating of the numeric representation selecting…).).
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claims 1 and 2.

Regarding claim 10, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; and further teach wherein the training objects are not associated with an inherent temporal order, and wherein the ordered list of training objects is generated in accordance with a pre-determined rule (Kimmel, paragraphs 0004-0006 teaches “During each tree level iteration a plurality of training data samples is received by a distributed processing control unit, the training data samples include a plurality of data subsets, each corresponding to one of a plurality of attributes (set of training objects are not associated with an inherent temporal order). The data subsets are distributed to a plurality of slave processing units after sorting the data samples (the ordered list of training objects is generated) in consecutive ascending order (in accordance with a pre-determined rule) by updating a first index identifying the trajectories of the training data samples through tree nodes of the previous tree level”).
Milton, Kimmel, Farrell, and Venkataraman are combinable for the same rationale as set forth above with respect to claims 1 and 2.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub 20140214736) hereinafter Kimmel, in view of Farrell et al (US Patent 5657424) hereinafter Farrell, in view of Venkataraman et al (US Pub 20180232375) hereinafter Venkataraman, in view of Fano et al (US Pub 20050189415) hereinafter Fano.
Regarding claim 9, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; however the combination does not explicitly teach wherein the training objects are associated with an inherent temporal order, and wherein the ordered list of training objects comprises the training objects ordered in accordance with the temporal order.
Fano teaches wherein the training objects are associated with an inherent temporal order, and wherein the ordered list of training objects comprises the training objects ordered in accordance with the temporal order (paragraphs 0159-0160 teach training decision tree classifier with a training set in which the examples are in “temporal order” (wherein the training objects are associated with an inherent temporal order, and wherein the ordered list of training objects comprises the training objects ordered in accordance with the temporal order)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “root node” classification influence on dependent nodes during training of decision tree algorithm building as taught by Farrell, as modified by decision tree training set of documents with associated user click information, job description field, and search query as taught by Venkataraman, to include training decision tree classifier with a training set in which the examples are in “temporal order” as taught by Fano in order to increase accuracy of a prediction based on times related to specific training data (Fano, paragraphs 0052 and 0160).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Milton et al (US Pub 20150199699) hereinafter Milton, in view of Kimmel et al (US Pub .
Regarding claim 11, the combination of Milton, Kimmel, Farrell, and Venkataraman teach all the claim limitations of claim 1 above; however the combination does not explicitly teach wherein the training objects are not associated with an inherent temporal order, and wherein the ordered list of training objects is generated based on a random order of the training objects to be used as the ordered list.
Hong teaches wherein the training objects are not associated with an inherent temporal order, and wherein the ordered list of training objects is generated based on a random order of the training objects to be used as the ordered list (paragraphs 0027 and 0061 teaches training models including “decision trees” on a “randomly permuted” order of “training set” with no timestamps (wherein the set of training objects is not associated with an inherent temporal order, and wherein the ordered list of training objects is generated based on a random order of the training objects to be used as the ordered list)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify training a decision tree and computing “each leaf node” training error, as taught by Milton as modified by “sorting the [training] data samples in consecutive ascending order” for decision tree training as taught by Kimmel, as modified by “root node” classification influence on dependent nodes during training of decision tree algorithm building as taught by Farrell, as modified by decision .


Claims 13-21 are rejected under 35 U.S.C. 103 as being unpatentable over Farrell et al (US Patent 5657424) hereinafter Farrell, in view of McFall et al (US Pub 20200327252) hereinafter McFall.
Regarding claim 13, Farrell teaches a method of generating a split value for a node in a decision tree of a decision tree model used by a Machine Learning Algorithm (MLA), the split value being for a node at a particular level of the decision tree (Col. 8, line 34-Col. 9, line 29 and equation 1 teach recursively splitting nodes in a decision tree classifier algorithm (of a decision tree model used by a Machine Learning Algorithm (MLA)) according to a calculated dividing plane value (generating a split value for a node in a decision tree) for an area of plotted vectors for nodes at different levels of the tree (the split value being for a node at a particular level of the decision tree)), the node for classifying an object having a categorical feature value that is to be translated into a numeric value representative thereof (Col. 7, line 27-Col. 8, line 33, Col. 8, line 34-Col. 9, line 67, Col. 10, line 60-Col. 11, line 22 teach at each node in specific levels of a decision tree, classifying supervised training vectors (the node for classifying an object having a categorical feature value) , the split value is for causing the object to be classified in one child node of a plurality of child nodes of the node based on the numeric value and the split value, the MLA executable by an electronic device to predict a value for an in-use object (Col. 4, lines 3-48, Col. 7, line 27-Col. 8, line 33, Col. 8, line 34-Col. 9, line 67, Col. 10, line 60-Col. 11, line 22, equation1, and Fig. 4 teach a processor training a decision tree classifier algorithm that includes recursively splitting nodes in a decision tree according to a calculated dividing plane value (the split value is for causing) for an area of plotted vectors for nodes at different levels of the tree, and the splitting creating “child nodes” of a “root” or “non-terminal” node that properly classify the supervised training vectors (the object to be classified in one child node of a plurality of child nodes of the node… the MLA executable by an electronic device to predict a value for an in-use object), thus the “root” or “non-terminal” node classifications and corresponding numeric match classifications determining which split child node to send the training vector (based on the numeric value and the split value)), the method comprising:
generating a range of all possible values of the categorical features (Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach nodes are split according to a process, wherein “the oval area T represents a region in which all of the data vectors representative of the target word (categorical features) are found (generating a range of [some] possible values), while all of the other vectors (categorical features) are found in a space represented by the oval NT” ; 
applying a grid to the range to separate the range into region, each region having a boundary (Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4  teach defining “a determinant plane” to a plotted area (grid) of plotted vector regions of all values (applying a grid to the range) to divide the regions into associated node areas (to separate the range into region) that are depicted as having a limited distance from the centroid within the plotted area and the plane (each region having a boundary)); 
using the boundary as the split value (Col. 8, line 34-Col. 9, line 29, equation 1, and Fig 4 teach the divided regions associated with node areas that are depicted as having a limited distance from the centroid (boundary) and from the dividing plane (boundary) in order to “split” nodes with corresponding vector value classification regions (using the boundary as the split value)); 
the generating and the applying being executed before the categorical feature value is translated into the numeric representation thereof (Col. 8, line 34-Col. 9, line 29, equation 1, and Fig 4 teach splitting nodes and forming a decision tree in order to then (the generating and the applying being executed before) properly classify the training vectors (categorical feature value) and convert the result to a numeric value (representation) in addition to a confidence value of the classification (alternative numeric representation)).

a grid (see mapping above), however McFall teaches a grid (paragraphs 0482-0491 teach splitting a decision tree node according to a node’s corresponding data being divided into ranges with boundaries of “approximately equal population” (a grid utilization)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement McFall’s teachings of splitting decision tree nodes according to a data range division and boundaries into Farrell’s teaching of parent node classification influence on dependent nodes and node splitting operations during training of decision tree algorithm building in order to increase classification accuracy of a decision model through developing specific classification nodes (McFall, paragraphs 0482-0491).

Regarding claim 14, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the grid has a pre-determined format (Farrell, Col. 8, line 34-Col. 9, line 29  and equation 1 teach the applied plane to a plotted area (grid) being greater than “0” (has a pre-determined format)).

Regarding claim 15, the combination of Farrell and McFall teach all the claim limitations of claim 14 above; and further teach wherein the grid is one of a regular interval grid and an irregular interval grid (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach a plane dividing vectors plotted in an area, thus creating .

Regarding claim 16, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the range is between zero and one (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach nodes are split according to a plotted area and divided “region[s]”, wherein the regions represent a match or not match for the supervised training vectors and converted to representative numeric range of match “1” or not match “0” (range is between zero and one)).

Regarding claim 17, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the numeric representations of the categorical feature values are calculated using a predefined constant value Rconstant and wherein the range is between Rconstant and 1+(Rconstant) (Farrell, Col. 4, lines 3-48, Col. 7, line 27-Col. 8, line 33, Col. 8, line 34-Col. 9, line 67, Col. 10, line 60-Col. 11, line 22, equation1, and Fig. 4 teach at each node in specific levels of a decision tree, classifying supervised training vectors (categorical feature values) and convert the result (calculated using) to a numeric value (wherein the numeric representations of the categorical feature values are calculated), wherein the conversion includes a resultant not match being “0” (wherein the range is between Rconstant) or match being “1” (and 1+(Rconstant)) with the minimum being a value of 0 (are calculated using a predefined constant value Rconstant)).

Regarding claim 18, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the method further comprises, during an in-use phase, for a given counter value representing a categorical feature, determining a bucket defined by the grid the given counter value falls into and using boundaries associated with the bucket as values for splits (Farrell, Col. 4, lines 3-48, Col. 7, line 27-Col. 8, line 33, Col. 8, line 34-Col. 9, line 67, Col. 10, line 60-Col. 11, line 22, equation1, and Fig. 4 teach while training and also executing the decision tree for classifying vectors (during an in-use phase), plotting the vectors in an area with a calculated dividing plane (and using boundaries associated with the bucket as values for splits) and determining “the total number of vectors of both classes (for a given counter value representing a categorical feature, determining a bucket defined by the grid the given counter value falls into) assigned to the terminal node”).

Regarding claim 19, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the using the boundary as the split value is executed for each level of the decision tree and wherein the method further comprises, once a given level of the decision tree is trained, re-calculating the split value (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach the divided regions associated with node areas that are depicted as having a limited distance from the centroid (boundary) and from the dividing plane (boundary) in order to “split” nodes with corresponding vector value classification regions (using the boundary as the split value). The splitting process is taught to be performed .

Regarding claim 20, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the decision tree model comprises a plurality of decision trees, wherein the using the boundary as the split value is executed for each decision tree of the plurality of decision trees, and wherein the method further comprises, once each decision tree of the plurality of decision trees is trained, re-calculating the each split value (Farrell, Col. 8, line 34-Col. 7, lines 27-43, Col. 9, line 29, equation 1, and Fig. 4 teach the divided regions associated with node areas that are depicted as having a limited distance from the centroid (boundary) and from the dividing plane (boundary) in order to “split” nodes with corresponding vector value classification regions (using the boundary as the split value). The splitting process is taught to be performed “recursively” for nodes in each “level” of a decision tree for the “decision trees” (is executed for each decision tree of the plurality of decision trees), wherein the boundaries are replotted at each training iteration of each level of a decision tree, and further for training each tree of the “decision trees for each of the target words” (wherein the decision tree model comprises a plurality of decision trees/wherein the method further comprises, once each decision tree of the plurality of decision trees is trained, re-calculating the each split value)).

Regarding claim 21, the combination of Farrell and McFall teach all the claim limitations of claim 13 above; and further teach wherein the using the boundary as the split value is executed during training of the MLA and wherein the training of the MLA, during a given iteration of one of: (i) a given level of the decision tree and (ii) a given iteration of the decision tree (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach the divided regions associated with node areas that are depicted as having a limited distance from the centroid (boundary) and from the dividing plane (boundary) in order to “split” nodes with corresponding vector value classification regions (using the boundary as the split value). The splitting process is taught to be performed “recursively” for nodes in each “level” of a decision tree (a given level of the decision tree), wherein the boundaries are replotted at each training iteration of each level of a decision tree classifier algorithm (is executed during training of the MLA and wherein the training of the MLA, during a given iteration of one of: (i) a given level of the decision tree and (ii) a given iteration of the decision tree)), comprises:
selecting a best value of a categorical feature to be placed at the given iteration and a best value of the split associated therewith (Farrell, Col. 8, line 34-Col. 9, line 29, equation 1, and Fig. 4 teach the divided regions associated with node areas that are depicted as having a limited distance from the centroid and from the dividing plane in order to “split” nodes with corresponding vector value classification regions (selecting…a best value of the split associated therewith). The splitting process is taught to be performed “recursively” (iteratively) for nodes in each “level” of a decision tree, wherein the boundaries are replotted at each training iteration of each level of a decision tree .

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123