Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending in the present application.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Step 1 analysis:
In the instant case, the claims are directed to a method (claims 1-10), and article of manufacture (claims 11-20). Thus, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).

Step 2A analysis:
Based on the claims being determined to be within of the four categories (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, including an observation, evaluation, judgement and opinion)” and “Mathematical calculations”.

Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claims 1 and 11:
“identifying a first plurality…” (observation);
“transforming the first feature… “ (mathematical calculation);
“generating a cross feature…” (mathematical calculation);
“estimating a predictive…” (mathematical calculation);
“adding the predictive power to a set…” (observation);
“selecting…highest estimated…” (observation);
“splitting… to generate…” (mathematical calculation);
“…performed by… computing devices.” (generic computer).

Step 2A: Prong 2 analysis:
This judicial exception is not integrated into a practical application because the additional element in claims 1 and 11, “computing devices”…”storage media storing instructions”…“one or more processors” correspond to mere instructions to implement an abstract idea or other exception on a generic computer.  Applying an otherwise abstract idea to a generic computer does not make 
Step 2B analysis:
The limitations “wherein the method is performed by one or more computing devices” is merely the application of the abstract idea to a generic computer. Thus, the claims as a whole do not amount to significantly more than the judicial exception. 

Step 2A, Prong 1 analysis:
Claims 2 and 12:
“wherein the fourth feature comprises…” (mathematical calculation);

Step 2A: Prong 2 analysis:
The further limitations in claims 2 and 12 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. 

Step 2A, Prong 1 analysis:

“determining a minimum resolution…” (judgement)
“determining a minimum value…” (judgement)
“determining a maximum value…” (judgement)
“wherein identifying…” (evaluation)

Step 2A: Prong 2 analysis
The further limitations in claims 2 and 12 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Step 2A, Prong 1 analysis:
Claims 4 and 14:
“identifying a particular bucket…” (judgement);
“generating a first bucket…” (evaluation);
“wherein a first boundary…first bucket…” (judgment);
“wherein a second boundary…first bucket…” (evaluation);
“wherein a first boundary…second bucket…” (evaluation);
“wherein a second boundary…second bucket…” (judgement);


Step 2A: Prong 2 analysis
The further limitations in claims 4 and 14 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Step 2A, Prong 1 analysis:
Claims 5 and 15:
“wherein estimating…comprises calculating…” (mathematical calculation);

Step 2A: Prong 2 analysis
The further limitations in claims 5 and 15 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Step 2A, Prong 1 analysis:
Claims 6 and 16:
“removing the first from…possible splits” (evaluation);

Step 2A: Prong 2 analysis
The further limitations in claims 6 and 16 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claims 7 and 17:
“transforming the fourth feature… “ (mathematical calculation);
“generating a second cross feature…” (mathematical calculation);
“estimating a second predictive…” (mathematical calculation);
“adding the second predictive power to a second set…” (observation);
“selecting…highest estimated…” (observation);
“splitting… to generate…” (mathematical calculation);

Step 2A: Prong 2 analysis:
The further limitations in claims 7 and 17 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claims 8 and 18:
“generating a first estimate…” (mathematical calculation);
“generating a second estimate…” (mathematical calculation);
“determining whether…difference…less than a threshold…” (judgement);
“using…feature when training a model…in response to determining…” (repetitive calculation);

Step 2A: Prong 2 analysis:
This judicial exception is not integrated into a practical application because the additional element in claims 8 and 18, “using the first cross feature or the third cross feature as a feature when training a model in response to determining that the difference between the first estimate and the second estimate is less than the threshold value.” is merely an insignificant extra-solution activity to the judicial exception (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
Step 2B analysis:
The limitation “using the first cross feature or the third cross feature as a feature when training a model in response to determining that the difference between the first estimate and the second estimate is less than the threshold value.” is the performance of a repetitive calculation, and is known by the courts to be well understood, routine, and conventional (MPEP 2106.05(d)(II)). 


Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claims 9 and 19:
“wherein a count has a particular value…” (observation);
“incrementing the count” (mathematical calculation);
“after selecting… determining whether the count equals a threshold value” (mathematical calculation);
“in response… transforming the fourth feature…” (mathematical calculation);
“incrementing the count” (mathematical calculation);
“after selecting… determining whether the count equals a threshold value” (mathematical calculation);
“in response...using…feature when training a model” (repetitive calculation);

Step 2A: Prong 2 analysis:
This judicial exception is not integrated into a practical application because the additional element in claims 9 and 19, “in response to determining that the count equals the threshold value, using the third cross feature as a feature when training a model.” is merely an insignificant extra-solution activity to the judicial exception (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
Step 2B analysis:


Step 2A, Prong 1 analysis:
Claims 10 and 20:
“wherein the first feature is…” (observation);

Step 2A: Prong 2 analysis
The further limitations in claims 10 and 20 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3, 6-11, 13, and 16-20 rejected under 35 U.S.C. 103 as being unpatentable over “AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications” to Luo et al (hereinafter, Luo) in view of “Use of Contextual Information for Feature Ranking and Discretization” to Hong (hereinafter, Hong).

As per claim 1, Luo teaches A method comprising: identifying a first plurality of possible splits of a first feature that is a numeric feature (4.4 Preprocessing p6, “The most simple and widely-used discretization method is equal-width discretization, i.e., to split the value range of a feature into several equal-width intervals… The basic idea is simple: instead of using a fine-tuned granularity, we discretize each numerical feature into several, rather than only one, categorical features, each with a different granularity.” Fig. 5.);
for each split of the first plurality of possible splits: transforming the first feature into a second feature based on said each split (4.4 Preprocessing p6, “The most simple and widely-used discretization method is equal-width discretization, i.e., to split the value range of a feature into several equal-width intervals… The basic idea is simple: instead of using a fine-tuned granularity, we discretize each numerical feature into several, rather than only one, categorical features, each with a different granularity.” Fig. 5, See transformation of original numerical feature into 1st discretized feature.);
generating a cross feature based on the second feature and a third feature that is different than the first feature and the second feature (4.2 Feature Generation, p4-5, “In AutoCross, we consider a tree-structured space T depicted in Figure 3, where each node corresponds to a feature set and the root is the original feature set F. For simplicity, in this example, we denote the crossing of two features A and B as AB, and higher-order cross features in similar ways. For a node (a feature set), its each child is constructed by adding to itself one pair-wise crossing of its own elements. The pair-wise interactions between cross features (or a cross feature and an original feature) will lead to high-order feature crossing… First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next” Figure 3, Algorithm 1. Examiner Note: Luo processes numeric features into a second feature prior to crossing, as seen in 4.4 Preprocessing. In 4.2 and Figure 3, Luo then crosses features ;
	estimating a predictive power of the cross feature (Algorithm 1, “evaluate all candidate feature sets…” 4.2 Feature Generation, p5, “First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next.”
4.3 Feature Set Evaluation, p5, “First, we use logistic regression (LR) trained with mini-batch gradient descent to evaluate candidate feature sets, and use the corresponding performance to approximate the performance of the learning algorithm L that actually follows” Examiner Note: Luo’s performance of a feature in terms of contribution to a learning algorithm is seen as equivalent to the instant application’s predictive power.);
splitting the first feature based on the first split to generate a fourth feature that is different than the first feature (Figure 5, 2nd discretized feature. 4.4 Preprocessing p6, “The most simple and widely-used discretization method is equal-width discretization, i.e., to split the value range of a feature into several equal-width intervals… The basic idea is simple: instead of using a fine-tuned granularity, we discretize each numerical feature into several, rather than only one, categorical features, each with a different granularity.” Examiner Note: The 2nd discretized feature is created by the first split (1st discretized feature) and is different from the first feature.);
wherein the method is performed by one or more computing devices (Abstract, “Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross.”).

adding the predictive power to a set of estimated predictive powers, and selecting a first cross feature that is associated with the highest estimated predictive power in the set of estimated predictive powers.

Hong teaches adding the predictive power to a set of estimated predictive powers (4, A New Contextual Merit Function for Features p 720, “In general, one feature does not distinguish classes by itself; it does so in combination with other features. Therefore, it is desirable to obtain the feature’s correlation to the class in the context of other features. We seek a merit function that captures this contextual correlation implicitly, since enumerating all possible contexts is impractical.” p,. 721 “We give the contextual merit algorithm to compute M = (m1, m2, ..., mNf) according to (5) here” p. 722, “The first example case is an EXOR(4, 10, 1,000). The CM algorithm produces the following merit values for the first four base variables and the remaining 10 random variables (delineated by a “i”). Since we are interested in the relative importance, the values shown are decimal shifted and rounded so that the maximum is a three-digit number: M = (184 194 176 176 i 89 87 93 90 89 87 89 92 93 87).” Examiner Note: Luo teaches crossing features, evaluating those crossed features by their performance, and selecting the crossed feature with the highest performance. However, Luo stores only the best performing crossed feature, replacing the stored best feature when it determines a new crossed feature to have a greater performance, rather than storing all the performance scores and comparing the stored scores later. Hong also evaluates potential features, and stores the scores of those features in a set M upon evaluation. When Hong is applied to Luo, the resulting system would evaluate crossed features based on their predictive power, and add those estimated powers to a set.);
selecting a first cross feature that is associated with the highest estimated predictive power in the set of estimated predictive powers, wherein the first cross feature corresponds to a first split in the first plurality of possible splits (4, A New Contextual Merit Function for Features p 720, “In general, one feature does not distinguish classes by itself; it does so in combination with other features. Therefore, it is desirable to obtain the feature’s correlation to the class in the context of other features. We seek a merit function that captures this contextual correlation implicitly, since enumerating all possible contexts is impractical.” p,. 721 “We give the contextual merit algorithm to compute M = (m1, m2, ..., mNf) according to (5) here” p. 722, “The first example case is an EXOR(4, 10, 1,000). The CM algorithm produces the following merit values for the first four base variables and the remaining 10 random variables (delineated by a “i”). Since we are interested in the relative importance, the values shown are decimal shifted and rounded so that the maximum is a three-digit number: M = (184 194 176 176 i 89 87 93 90 89 87 89 92 93 87).” Examiner Note: Luo teaches crossing features, evaluating those crossed features by their performance, and selecting the crossed feature with the highest performance (see Fig. 4, selected features, 4.3 “After a candidate is selected to replace the current solution S∗ (Step 6, Algorithm 1), we train an LR model with the new S∗, evaluate its performance…”). However, Luo stores only the best performing crossed feature, replacing the stored best feature when it determines a new crossed feature to have a greater performance, rather than storing all the performance scores and comparing the stored scores later (see Luo Algorithm 1, “evaluate all candidate feature sets…” 4.2 Feature Generation, p5, “First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next.” 4.3 Feature Set Evaluation, p5, “First, we use logistic regression (LR) trained with mini-batch gradient descent to evaluate candidate feature sets, and use the corresponding performance to approximate the performance of the learning algorithm L that actually follows”). Hong also evaluates potential features, and stores the scores of those features in a set M upon evaluation. When Hong is applied to Luo, the resulting system would evaluate crossed features based on their predictive power, and add those estimated powers to a set. .

Luo and Hong are analogous art because they are both directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature merit system. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy of the system, which can be accomplished through selection of high merit features (Hong, Application Experience, p. 728, “The contextual feature analysis and discretization algorithms described in this paper are incorporated into our data abstraction system called RAMP… For these real cases, the resultant accuracy of the RAMP generated rules were either comparable or higher than those obtained from SWAP1, C4.5, or CART, some significantly.”)

As per claim 3, the combination of Luo and Hong thus far teach The method of claim 1. 

Luo does not explicitly teach further comprising determining a minimum resolution of the first feature; determining a minimum value of the first feature; determining a maximum value of the first feature; wherein identifying the first plurality of possible splits is based on the minimum resolution, the minimum value, and the maximum value.

further comprising: determining a minimum resolution of the first feature; determining a minimum value of the first feature (Section 8, p725, “When a numeric feature Xk is discretized, the new component distance becomes 1 if and only if there is a cut point between the pair of values, xki and xkj, and 0 otherwise…An entry in the SPANk is an interval on the value line of Xk, specified by its beginning and ending points and a weight value which is the approximate merit contribution should the interval be cut.” NFD2) Perform CM to obtain the contextual merits and SPAN. Examiner Note: Hong’s setting of component distance based on whether the numeric feature is to be cut is seen as equivalent to determining the minimum resolution of the numeric feature (either 0 in the case of no cut, or 1 in the case of at least one cut). In Hong’s Numeric Feature Discretization algorithm, Hong determines the SPAN of each numeric feature. Per section 8, the span of a feature includes both the minimum and maximum of that feature. As such, determining the SPAN of a feature is seen as equivalent to determining the minimum value of that feature.);
determining a maximum value of the first feature (Section 8, p725, “An entry in the SPANk is an interval on the value line of Xk, specified by its beginning and ending points and a weight value which is the approximate merit contribution should the interval be cut.” NFD2) Perform CM to obtain the contextual merits and SPAN. Examiner Note: In Hong’s Numeric Feature Discretization algorithm, Hong determines the SPAN of each numeric feature. Per section 8, the span of a feature includes both the minimum and maximum of that feature. As such, determining the SPAN of a feature is seen as equivalent to determining the maximum value of that feature.);
wherein identifying the first plurality of possible splits is based on the minimum resolution, the minimum value, and the maximum value (Section 8 p. 725, “The SPAN list thus obtained is used to discretize all the numeric features.” NFD5.2) Perform IC : Monitor the progress .

Luo and Hong are analogous art because they are both directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature resolution system. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy of the system, which can be accomplished through selection of feature resolution (Hong, Application Experience, p. 728, “The contextual feature analysis and discretization algorithms described in this paper are incorporated into our data abstraction system called RAMP… For these real cases, the resultant accuracy of the RAMP generated rules were either comparable or higher than those obtained from SWAP1, C4.5, or CART, some significantly.”)

As per claim 6, the combination of Luo and Hong thus far teaches The method of claim 1.

Luo teaches further comprising: removing the first split from the first plurality of possible splits to create a second plurality of possible splits (4.2 Feature Set Generation, p. 4 “We consider the feature crossing problem (Problem (4)). Assume the size of the original feature set is d, which is also the highest order of cross features… the number of all possible feature sets is 2^(2^d−1), a double exponential function of d” Algorithm 1. 4.5 Termination, “Three kinds of termination conditions are used in AutoCross: 1) runtime condition…2) performance condition…3) maximal feature number” Examiner Note: Luo applies feature discretization to any numerical features that it processes. Luo additionally processes a plurality of features. When Luo’s system is .

As per claim 7, the combination of Luo and Hong thus far teaches The method of claim 1.

Luo teaches further comprising: for each split of a second plurality of possible splits of the fourth feature (Figure 5, 2nd and 3rd discretized feature. 4.4 Preprocessing p6, “The most simple and widely-used discretization method is equal-width discretization, i.e., to split the value range of a feature into several equal-width intervals… The basic idea is simple: instead of using a fine-tuned granularity, we discretize each numerical feature into several, rather than only one, categorical features, each with a different granularity.” Examiner Note: The 2nd discretized feature is seen equivalent to a fourth feature different from the first feature. The 3rd discretizations 0-3 are possible splits of that fourth feature.):
transforming the fourth feature into a fifth feature based on said each split (Figure 5, 3rd discretized feature. Examiner Note: The transformation of the 2nd discretized feature into the 3rd discretized feature is seen as equivalent to transforming the fourth feature into a fifth feature based on the possible splits of the feature.);
generating a second cross feature of the fifth feature and the third feature (4.2 Feature Generation, p4-5, “In AutoCross, we consider a tree-structured space T depicted in Figure 3, where each node corresponds to a feature set and the root is the original feature set F. For simplicity, in this example, we denote the crossing of two features A and B as AB, and higher-order cross features in similar ways. For a node (a feature set), its each child is constructed by adding to itself one pair-wise crossing of its own elements. The pair-wise interactions between cross features (or a cross ;
estimating a second predictive power of the second cross feature (Algorithm 1, “evaluate all candidate feature sets…” 4.2 Feature Generation, p5, “First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next.” 4.3 Feature Set Evaluation, p5, “First, we use logistic regression (LR) trained with mini-batch gradient descent to evaluate candidate feature sets, and use the corresponding performance to approximate the performance of the learning algorithm L that actually follows” Examiner Note: Luo’s performance of a feature in terms of contribution to a learning algorithm is seen as equivalent to the instant application’s predictive power.);
splitting the fourth feature based on the second split to generate a sixth feature that is different than the first, third, and fourth features (Fig. 5, 3rd discretized feature. Examiner Note: The 3rd discretized feature are generated from the 2nd split (2nd discretized feature) and are different from the other created features.).

Luo does not explicitly teach adding the predictive power to a set of estimated predictive powers, and selecting a first cross feature that is associated with the highest estimated predictive power in the set of estimated predictive powers.

Hong teaches adding the second predictive power to a second set of estimated predictive powers (4, A New Contextual Merit Function for Features p 720, “In general, one feature does not distinguish classes by itself; it does so in combination with other features. Therefore, it is desirable to obtain the feature’s correlation to the class in the context of other features. We seek a merit function that captures this contextual correlation implicitly, since enumerating all possible contexts is impractical.” p,. 721 “We give the contextual merit algorithm to compute M = (m1, m2, ..., mNf) according to (5) here” p. 722, “The first example case is an EXOR(4, 10, 1,000). The CM algorithm produces the following merit values for the first four base variables and the remaining 10 random variables (delineated by a “i”). Since we are interested in the relative importance, the values shown are decimal shifted and rounded so that the maximum is a three-digit number: M = (184 194 176 176 i 89 87 93 90 89 87 89 92 93 87).” Examiner Note: Luo teaches crossing features, evaluating those crossed features by their performance, and selecting the crossed feature with the highest performance. However, Luo stores only the best performing crossed feature, replacing the stored best feature when it determines a new crossed feature to have a greater performance, rather than storing all the performance scores and comparing the stored scores later. Hong also evaluates potential features, and stores the scores of those features in a set M upon evaluation. When Hong is applied to Luo, the resulting system would evaluate crossed features based on their predictive power, and add those estimated powers to a set.);
selecting a third cross feature that is associated with the highest estimated predictive power in the second set of estimated predictive powers, wherein the third cross feature corresponds to a second split in the second plurality of possible splits (4, A New Contextual Merit Function for Features p 720, “In general, one feature does not distinguish classes by itself; it does so in combination with other features. Therefore, it is desirable to obtain the feature’s ∗ (Step 6, Algorithm 1), we train an LR model with the new S∗, evaluate its performance…”). However, Luo stores only the best performing crossed feature, replacing the stored best feature when it determines a new crossed feature to have a greater performance, rather than storing all the performance scores and comparing the stored scores later (see Luo Algorithm 1, “evaluate all candidate feature sets…” 4.2 Feature Generation, p5, “First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next.” 4.3 Feature Set Evaluation, p5, “First, we use logistic regression (LR) trained with mini-batch gradient descent to evaluate candidate feature sets, and use the corresponding performance to approximate the performance of the learning algorithm L that actually follows”). Hong also evaluates potential features, and stores the scores of those features in a set M upon evaluation. When Hong is applied to Luo, the resulting system would evaluate crossed features based on their predictive power, and add those estimated powers to a set. The system would then select the crossed feature associated with the highest performance. When a numerical feature is discretized according to Luo 4.4 Preprocessing as described above, the cross .

Luo and Hong are analogous art because they are both directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature merit system. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy of the system, which can be accomplished through selection of high merit features (Hong, Application Experience, p. 728, “The contextual feature analysis and discretization algorithms described in this paper are incorporated into our data abstraction system called RAMP… For these real cases, the resultant accuracy of the RAMP generated rules were either comparable or higher than those obtained from SWAP1, C4.5, or CART, some significantly.”)

As per claim 8, the combination of Luo and Hong thus far teach The method of claim 7.

Luo teaches further comprising: generating a first estimate of predictive power of the first cross feature (Algorithm 1, “evaluate all candidate feature sets…” 4.2 Feature Generation, p5, “First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next.” 4.3 Feature Set Evaluation, p5, “First, we use logistic regression (LR) trained with mini-batch gradient descent to evaluate candidate feature sets, and use the corresponding performance to approximate the performance of the learning algorithm L that actually follows” Examiner Note: Luo’s performance of a feature in terms of contribution to a learning algorithm is seen as equivalent to the instant application’s predictive power.);
generating a second estimate of predictive power of the third cross feature (Algorithm 1, “evaluate all candidate feature sets…” 4.2 Feature Generation, p5, “First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next.” 4.3 Feature Set Evaluation, p5, “First, we use logistic regression (LR) trained with mini-batch gradient descent to evaluate candidate feature sets, and use the corresponding performance to approximate the performance of the learning algorithm L that actually follows” Examiner Note: Luo’s performance of a feature in terms of contribution to a learning algorithm is seen as equivalent to the instant application’s predictive power. Luo specifically evaluates all candidate features.).

Luo does not explicitly teach determining whether a difference between the first estimate and the second estimate is less than a threshold value; using the first cross feature or the third cross feature as a feature when training a model in response to determining that the difference between the first estimate and the second estimate is less than the threshold value.

Hong teaches determining whether a difference between the first estimate and the second estimate is less than a threshold value (Section 4, p. 720 “We now define a distance measure Dij between two examples similar to many such measures in the literature, but with a slight modification: [Eq. 1] where for a symbolic feature, Xk, its component distance, [Eq. 2] and for a numeric feature, [Eq. 3] The value, tk, is a feature dependent threshold, designed to capture the notion that if the difference is “big enough” the two values are likely to be different when discretized, hence the component distance of 1. However, if the difference is less than the threshold, its distance is a fraction as shown above. This formulation coincides with the probability of an interval of length |xki  xkj| having its ends in two distinct bins, if the range of the feature were cut ;
using the first cross feature or the third cross feature as a feature when training a model in response to determining that the difference between the first estimate and the second estimate is less than the threshold value (Section 4, p. 720 “We now define a distance measure Dij between two examples similar to many such measures in the literature, but with a slight modification: [Eq. 1] where for a symbolic feature, Xk, its component distance, [Eq. 2] and for a numeric feature, [Eq. 3] The value, tk, is a feature dependent threshold, designed to capture the notion that if the difference is “big enough” the two values are likely to be different when discretized, hence the component distance of 1. However, if the difference is less than the threshold, its distance is a fraction as shown above. This formulation coincides with the probability of an interval of length |xki xkj| having its ends in two distinct bins, if the range of the feature were cut equally into consecutive bins of size tk. We will discuss how we determine the tk values in a later section” Examiner Note: Luo teaches using generated cross features to train a model. Hong teaches comparing cross features based on a threshold as shown above (see also 4.3 Feature Set Evaluation, “A vital step in Algorithm 1 is to evaluate the performance of candidate feature sets (Step 4). Here, the performance of a candidate set S is expressed as E (L(Dt r,S), Dvld ,S) (see Problem (4)), denoted as E(S) for short. To directly estimate it, we need to learn a model with algorithm L on the training set represented by S and evaluate its performance on the validation set. Though highly accurate, direct evaluation for feature sets is often rather expensive. In real-world business scenarios, training a model to convergence may take great computational resource”). When Hong is applied to .

Luo and Hong are analogous art because they are both directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature merit system. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy of the system, which can be accomplished through selection of high merit features (Hong, Application Experience, p. 728, “The contextual feature analysis and discretization algorithms described in this paper are incorporated into our data abstraction system called RAMP… For these real cases, the resultant accuracy of the RAMP generated rules were either comparable or higher than those obtained from SWAP1, C4.5, or CART, some significantly.”)

As per claim 9, the combination of Luo and Hong thus far teaches The method of claim 7.

Luo teaches further comprising: incrementing a count, wherein the count has a particular value prior to selecting the first cross feature (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the number is reached” Examiner Note: Luo teaches a count of cross features, where the program termination happens at a given threshold number of cross features. The count would inherently begin at 0 when no cross features have been generated.);
after selecting the first cross feature, determining whether the count equals a threshold value (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal ;
in response to determining that the count does not equal the threshold value, transforming the fourth feature into the fifth feature (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the number is reached” Examiner Note: Luo teaches a count of cross features, where the program termination happens at a given threshold number of cross features. The process would inherently continue so long as the threshold has not been reached.);
incrementing the count (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the number is reached” Examiner Note: Luo teaches a count of cross features, where the program termination happens at a given threshold number of cross features. The process would inherently continue so long as the threshold has not been reached.);
after selecting the third cross feature, determining whether the count equals the threshold value (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the number is reached” Examiner Note: Luo teaches a count of cross features, where the program termination happens at a given threshold number of cross features. The process would inherently continue so long as the threshold has not been reached.);
in response to determining that the count equals the threshold value, using the third cross feature as a feature when training a model (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the .

As per claim 10, the combination of Luo and Hong thus far teaches The method of claim 1.
Luo teaches wherein the first feature is a time-based feature and the second feature is a categorical feature (Section 2, Motivation, “While most early works of automatic feature generation focus on second-order interactions of original features [5, 6, 20, 22, 37], trends have appeared to consider higher-order (i.e., with order higher than two) interactions to make data more informative and discriminative [2, 27, 35, 44]. High-order cross features, just like other high-order interactions, can further improve the quality of data and increase predictive power of learning algorithms. For example, a third-order cross feature ‘item ⊗ time ⊗ region’ can be a strong feature to recommend regionally preferred food during certain festivals”).

Claim 11 is an article of manufacture claim corresponding to method claim 1. Claim 11 requires One or more storage media storing instructions which, when executed by one or more processors, cause (Abstract, “Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross.” Figure 2.). Claim 11 is rejected for the same reasons as claim 1.

Claim 13 is an article of manufacture claim corresponding to method claim 3. Claim 13 is rejected for the same reasons as claim 3.

Claim 16 is an article of manufacture claim corresponding to method claim 6. Claim 16 is rejected for the same reasons as claim 6.

Claim 17 is an article of manufacture claim corresponding to method claim 7. Claim 17 is rejected for the same reasons as claim 7.

Claim 18 is an article of manufacture claim corresponding to method claim 8. Claim 18 is rejected for the same reasons as claim 8.

Claim 19 is an article of manufacture claim corresponding to method claim 9. Claim 19 is rejected for the same reasons as claim 9.

Claim 20 is an article of manufacture claim corresponding to method claim 10. Claim 20 is rejected for the same reasons as claim 10.



Claims 2, 4, 5, 12, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over “AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications” to Luo et al (hereinafter, Luo) in view of “Use of Contextual Information for Feature Ranking and Discretization” to Hong (hereinafter, Hong), further in view of CN 109101562 A to Zhou (hereinafter, Zhou).

As per claim 2, the combination of Luo and Hong thus far teaches The method of claim 1. 

Luo and Hong teach the splitting and discretization of numerical features, but do not explicitly teach wherein the fourth feature comprises one more bucket than the first feature.
Zhou teaches wherein the fourth feature comprises one more bucket than the first feature (Page 2, “The classification partition of the first specified number corresponding to the first feature, the plurality of selected sample into first sample of first specified number; screening to meet the target of the first pre-set condition, the first sample from each of said first sample, wherein the target first sample is one or more; a plurality of features in the target first sample comprises obtaining the influence information of the first sample of the target maximum second feature, the second feature and the first feature is different; The classification partition of the second specified number of the second character corresponding to the first sample of the target into a second sample of the second specified number;” Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to classification zone of the first characteristic, the first specified number and classification sub-number of the first feature. such first characteristic is gender, gender including male and female two sorting .

Luo, Hong, and Zhou are analogous art because they are directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature merit system and Zhou’s bucketing system. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy and efficiency of the system, which can be accomplished through selection of features most pertinent to the model being trained (Zhou, page 1, “existing client data are present in the form of large data, find out the special data to be in the large population needs or is difficult. but the current applied from the large database screening to meet the needs of target population so as to more directly and effectively aiming at target population corresponding work, it not only can improve the working efficiency, and can make the working target has more pertinence, more obvious working effect. Therefore, accurately find the target population has actual application value in the large data.”)

As per claim 4, the combination of Luo and Hong thus far teaches The method of claim 1. 

Luo and Hong teach the splitting and discretization of numerical features, but do not explicitly teach the generation of buckets split from a feature.

Zhou teaches wherein transforming the first feature into the second feature based on the said each split comprises: identifying a particular bucket of the first feature that is to be split based on said each split (Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to ;
generating a first bucket and a second bucket based on the particular bucket; wherein a first boundary of the first bucket is the same as a first boundary of the particular bucket (Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to classification zone of the first characteristic, the first specified number and classification sub-number of the first feature. such first characteristic is gender, gender including male and female two sorting partition, then the first specified number is two, the selected sample is divided into two first samples, one of which is a first sample of the female, and the other one is a male first sample. and for example, first feature is age, age is pre-dispersed to (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five sorting partition, then the first ;
wherein a second boundary of the first bucket is based on said each split (Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to classification zone of the first characteristic, the first specified number and classification sub-number of the first feature. such first characteristic is gender, gender including male and female two sorting partition, then the first specified number is two, the selected sample is divided into two first samples, one of which is a first sample of the female, and the other one is a male first sample. and for example, first feature is age, age is pre-dispersed to (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five sorting partition, then the first specified number is five, the preselected sample into five first sample.” Page 6, “For example, the present embodiment of the discrete age is (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five intervals, the target first sample is correspondingly divided into five second sample respectively corresponding to five sections. equal to the further refining of the preselected sample, to find target people purchase rate is high.” Examiner Note: The particular bucket contains values from 0 to 100, and is split 5 times. Thus, the second boundary of the first bucket is placed at 20, in order to create 5 buckets.);
wherein a first boundary of the second bucket is based on said each split (Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected ;
wherein a second boundary of the second bucket is the same as a second boundary of the particular bucket (Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to classification zone of the first characteristic, the first specified number and classification sub-number of the first feature. such first characteristic is gender, gender including male and female two sorting partition, then the first specified number is two, the selected sample is divided into two first samples, one of which is a first sample of the female, and the other one is a male first sample. and for example, first feature is age, age is pre-dispersed to (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five sorting partition, then the first specified number is five, the preselected sample into five first sample.” Page 6, “For example, the present embodiment of the discrete age is (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five intervals, the target first sample is correspondingly divided into five .

Luo, Hong, and Zhou are analogous art because they are directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature merit system and Zhou’s bucketing system. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy and efficiency of the system, which can be accomplished through selection of features most pertinent to the model being trained (Zhou, page 1, “existing client data are present in the form of large data, find out the special data to be in the large population needs or is difficult. but the current applied from the large database screening to meet the needs of target population so as to more directly and effectively aiming at target population corresponding work, it not only can improve the working efficiency, and 

As per claim 5, the combination of Luo and Hong thus far teaches The method of Claim 1.

Luo and Hong teach estimation of a cross features performance, but do not specifically estimate the predictive power of the cross feature comprising calculating an entropy of a label and an entropy of the label given the cross feature.

Zhou teaches wherein estimating the predictive power of the cross feature comprises calculating an entropy of a label and an entropy of the label given the cross feature (Page 12, “Each feature of the embodiment respectively the effect value of the overall amount, obtained by the information gain algorithm, by calculating each feature individually after adding calculating process, affecting the magnitude of the overall entropy to obtain the influence value. information gain algorithm is calculated as follows: g (D, A) = H (D) H (D | A), wherein g (D, A) represents A feature affecting the magnitude of the overall entropy, H (D) represents the entropy of the preselected sample, H (D Ι A) represents entropy characteristic of sample A after dividing. The other embodiment of the application can be obtained by information gain respectively the influence value of each feature of the general information, the information gain by introducing penalty parameter correction, to reduce the influence of the small entropy of the small sample, namely, the information gain is penalty parameter * information gain.” Examiner Note: Luo teaches estimating the predictive power of a cross feature, but does not calculate the entropy of the feature or of the object when doing so. Zhou judges the value of a potential feature based on the entropy of both a label, and a label given the addition of a feature. When Zhou is applied to Luo, the resulting system .

Luo, Hong, and Zhou are analogous art because they are directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature merit system and Zhou’s entropy calculation. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy and efficiency of the system, which can be accomplished through selection of features most pertinent to the model being trained (Zhou, page 1, “existing client data are present in the form of large data, find out the special data to be in the large population needs or is difficult. but the current applied from the large database screening to meet the needs of target population so as to more directly and effectively aiming at target population corresponding work, it not only can improve the working efficiency, and can make the working target has more pertinence, more obvious working effect. Therefore, accurately find the target population has actual application value in the large data.”)

Claim 12 is an article manufacture claim corresponding to method claim 2. Claim 12 is rejected for the same reasons as claim 2.
Claim 14 is an article manufacture claim corresponding to method claim 4. Claim 14 is rejected for the same reasons as claim 4.
Claim 15 is an article manufacture claim corresponding to method claim 5. Claim 15 is rejected for the same reasons as claim 5.

Conclusion
US 20060059112 A1 to Cheng et al is considered pertinent due to the feature value evaluation disclosed.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL G SMITH whose telephone number is (571)272-9730. The examiner can normally be reached on Monday-Friday from 9:30 A.M. to 6:00 P.M. EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo, can be reached at telephone number 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (tollfree). 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Respectfully Submitted,/PAUL GORDON SMITH/Examiner, Art Unit 2126        
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126