Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-2, 4-5, 9-12, 14-15, and 18-20 are pending in the present application. Claims 3, 6-8, 13, and 16-17 are cancelled. Claims 1, 9, 11, and 18-19 are newly amended.

Response to Arguments
Applicant's arguments filed 2/8/2022 have been fully considered but they are not persuasive. 

Arguments regarding rejections under §101 (p. 8):
Per applicant’s argument that “the claims as presented herein are not directed to a Mathematical Formula or Mathematical Expression, but instead, recite specific steps that are part of a technique for deriving features for use with a machine learning model. The recited steps are not a pure mental process, because, as a practical matter, the steps cannot be performed in the human mind.”
The examiner thanks the applicant for their response, and would like to provide clarification. The splitting of numerical features at a high level as presently claimed is a mathematical operation capable of being performed in the human mind. For example, given a “feature” of numerical ages from 1 to 100, a person could readily split said feature, in their mind, into a plurality of buckets that they can better use to predict. A person might, for example, split the feature into 5 buckets, where bucket B is ages 20-40 and bucket C is ages 40-60, and readily, in their mind, create a cross feature BC in order to predict people of working age. 
Arguments regarding rejections under §103 (p.8-10):
Per applicant’s argument that “For example, neither the Luo or Hong references describe identifying a plurality of possible splits of a numeric feature, and then generating a cross feature for each split in a first plurality of splits, followed by generating a cross feature for each split in a second possible plurality of splits.”
Luo discloses the identification of a plurality of possible splits of a numeric feature in section 4.4 Preprocessing (see especially Figure 5.). In 4.2 and Figure 3, Luo then crosses those split features with a plurality of unique primary features. The examiner recognizes that a unique feature A, if it were numerical, would have been processed into a second feature. Thus, Luo generates a cross feature AB that is based on a second feature and a third feature different from both the first feature and the second feature. The splitting of numeric feature and generation of cross features based on that split can be repeated (see at least Figure 5 and section 4.5 Termination). Thus Luo discloses each and every aspect of the above limitations. 

Per applicant’s argument that “The Hong reference has nothing to do with feature engineering via deriving cross features, and as such, no person of skill in the art would have considered combining Hong with Luo in the first place.”
In response to applicant's argument that Hong is nonanalogous art, it has been held that a prior art reference must either be in the field of applicant’s endeavor or, if not, then be reasonably pertinent to the particular problem with which the applicant was concerned, in order to be relied upon as a basis for rejection of the claimed invention.  See In re Oetiker, 977 F.2d 1443, 24 USPQ2d 1443 (Fed. Cir. 1992).  In this case, both Luo and Hong are directed to feature analysis and feature splitting for model generation, and so are both clearly pertinent to the particular problem with which the applicant was concerned.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-2, 4-5, 9-12, 14-15, and 18-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Step 1 analysis:
In the instant case, the claims are directed to a method (claims  1-2, 4-5, 9-10), and article of manufacture (claims 12, 14-15, and 18-20). Thus, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).

Step 2A analysis:
Based on the claims being determined to be within of the four categories (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea), in this case the claims fall within the judicial exception of an abstract idea. Specifically the abstract idea of “Mental processes (including an observation, evaluation, judgement and opinion)” and “Mathematical calculations”.

Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claims 1 and 11:
“identifying a first plurality…” (observation);
“transforming the first feature… “ (mathematical calculation);
“generating a cross feature…” (mathematical calculation);
“estimating a predictive…” (mathematical calculation);
“adding the predictive power to a set…” (observation);
“selecting…highest estimated…” (observation);
“splitting… to generate…” (mathematical calculation);
“determining a minimum resolution…” (judgement)
“determining a minimum value…” (judgement)
“determining a maximum value…” (judgement)
“wherein identifying…” (evaluation)
“removing the first from…possible splits” (evaluation);
“transforming the fourth feature… “ (mathematical calculation);
“generating a second cross feature…” (mathematical calculation);
“estimating a second predictive…” (mathematical calculation);
“adding the second predictive power to a second set…” (observation);
“selecting…highest estimated…” (observation);
“splitting… to generate…” (mathematical calculation);

Step 2A: Prong 2 analysis:
This judicial exception is not integrated into a practical application because the additional element in claims 1 and 11, “computing devices”…”storage media storing instructions”…“one or more processors” correspond to mere instructions to implement an abstract idea or other exception on a generic computer.  Applying an otherwise abstract idea to a generic computer does not make the claim patentable, see MPEP 2106 (I), “An abstract idea does not become nonabstract by limiting the invention to a particular field of use or technological environment, such as the Internet [or] a computer”. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
Step 2B analysis:
The limitations “wherein the method is performed by one or more computing devices” is merely the application of the abstract idea to a generic computer. Thus, the claims as a whole do not amount to significantly more than the judicial exception. 

Step 2A, Prong 1 analysis:
Claims 2 and 12:
“wherein the fourth feature comprises…” (mathematical calculation);

Step 2A: Prong 2 analysis:
The further limitations in claims 2 and 12 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. 

Step 2A: Prong 2 analysis
The further limitations in claims 2 and 12 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Step 2A, Prong 1 analysis:
Claims 4 and 14:
“identifying a particular bucket…” (judgement);
“generating a first bucket…” (evaluation);
“wherein a first boundary…first bucket…” (judgment);
“wherein a second boundary…first bucket…” (evaluation);
“wherein a first boundary…second bucket…” (evaluation);
“wherein a second boundary…second bucket…” (judgement);


Step 2A: Prong 2 analysis
The further limitations in claims 4 and 14 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Step 2A, Prong 1 analysis:
Claims 5 and 15:
“wherein estimating…comprises calculating…” (mathematical calculation);

Step 2A: Prong 2 analysis
The further limitations in claims 5 and 15 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Step 2A: Prong 1 analysis:
The claim(s) recite(s):
Claims 9 and 19:
“wherein a count has a particular value…” (observation);
“incrementing the count” (mathematical calculation);
“after selecting… determining whether the count equals a threshold value” (mathematical calculation);
“in response… transforming the fourth feature…” (mathematical calculation);
“incrementing the count” (mathematical calculation);
“after selecting… determining whether the count equals a threshold value” (mathematical calculation);
“in response...using…feature when training a model” (repetitive calculation);

Step 2A: Prong 2 analysis:
This judicial exception is not integrated into a practical application because the additional element in claims 9 and 19, “in response to determining that the count equals the threshold value, using the third cross feature as a feature when training a model.” is merely an insignificant extra-solution activity to the judicial exception (MPEP 2106.05(f)). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
Step 2B analysis:
The limitation “in response to determining that the count equals the threshold value, using the third cross feature as a feature when training a model.” is the performance of a repetitive calculation, and is known by the courts to be well understood, routine, and conventional (MPEP 2106.05(d)(II)).

Step 2A, Prong 1 analysis:
Claims 10 and 20:
“wherein the first feature is…” (observation);

Step 2A: Prong 2 analysis
The further limitations in claims 10 and 20 are directed to a judicial exception and nothing more. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 3, 6-11, 13, and 16-20 rejected under 35 U.S.C. 103 as being unpatentable over “AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications” to Luo et al (hereinafter, Luo) in view of “Use of Contextual Information for Feature Ranking and Discretization” to Hong (hereinafter, Hong).

As per claim 1, Luo teaches A computer-implemented method comprising: for a first feature that is a numeric feature, identifying a first plurality of possible splits of the first feature [based on a determined minimum resolution of the first feature, a determined minimum value of the first feature, and a determined maximum value of the first feature, wherein the size of buckets for the numeric feature resulting from the first plurality of possible splits is nonuniform] (4.4 Preprocessing p6, “The most simple and widely-used discretization method is equal-width discretization, i.e., to split the value range of a feature into several equal-width intervals… The basic idea is simple: instead of using a fine-tuned granularity, we discretize each numerical feature into several, rather than only one, categorical features, each with a different granularity. In order to avoid the dramatic increase in feature number caused by discretization, once these features are generated, we use fieldwise LR (without considering bsum) to evaluate them and keep only the best half. A remaining problem is how to determine the levels of granularity. For an experienced user, she can set a group of potentially good values. If no values are specified, AutoCross will use {10p}Pp=1 as default values, where P is an integer determined by a rule-based mechanism that considers the available memory, data size and feature numbers.” Fig. 5.);
for each split of the first plurality of possible splits: transforming the first feature into a second feature based on said each split (4.4 Preprocessing p6, “The most simple and widely-used discretization method is equal-width discretization, i.e., to split the value range of a feature into several equal-width intervals… The basic idea is simple: instead of using a fine-tuned granularity, we discretize each numerical feature into several, rather than only one, categorical features, each with a different granularity.” Fig. 5, See transformation of original numerical feature into 1st discretized feature.);
generating a cross feature based on the second feature and a third feature that is different than the first feature and the second feature (4.2 Feature Generation, p4-5, “In AutoCross, we consider a tree-structured space T depicted in Figure 3, where each node corresponds to a feature set and the root is the original feature set F. For simplicity, in this example, we denote the crossing of two features A and B as AB, and higher-order cross features in similar ways. For a node (a feature set), its each child is constructed by adding to itself one pair-wise crossing of its own elements. The pair-wise interactions between cross features (or a cross feature and an original feature) will lead to high-order feature crossing… First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next” Figure 3, Algorithm 1. Examiner Note: Luo processes numeric features into a second feature prior to crossing, as seen in 4.4 Preprocessing. In 4.2 and Figure 3, Luo then crosses features with a plurality of unique primary features. The examiner recognizes that a unique feature A, if it were numerical, would have been processed into a second feature. Thus, Luo generates a cross feature AB that is based on a second feature and a third feature different from both the first feature and the second feature.);
	estimating a predictive power of the cross feature (Algorithm 1, “evaluate all candidate feature sets…” 4.2 Feature Generation, p5, “First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next.”
4.3 Feature Set Evaluation, p5, “First, we use logistic regression (LR) trained with mini-batch gradient descent to evaluate candidate feature sets, and use the corresponding performance to approximate the performance of the learning algorithm L that actually follows” Examiner Note: Luo’s performance of a feature in terms of contribution to a learning algorithm is seen as equivalent to the instant application’s predictive power.);
splitting the first feature based on the first split to generate a fourth feature that is different than the first feature (Figure 5, 2nd discretized feature. 4.4 Preprocessing p6, “The most simple and widely-used discretization method is equal-width discretization, i.e., to split the value range of a feature into several equal-width intervals… The basic idea is simple: instead of using a fine-tuned granularity, we discretize each numerical feature into several, rather than only one, categorical features, each with a different granularity.” Examiner Note: The 2nd discretized feature is created by the first split (1st discretized feature) and is different from the first feature.);
removing the first split from the first plurality of possible splits to create a second plurality of possible splits (4.2 Feature Set Generation, p. 4 “We consider the feature crossing problem (Problem (4)). Assume the size of the original feature set is d, which is also the highest order of cross features… the number of all possible feature sets is 2^(2^d−1), a double exponential function of d” Algorithm 1. 4.5 Termination, “Three kinds of termination conditions are used in AutoCross: 1) runtime condition…2) performance condition…3) maximal feature number” Examiner Note: Luo applies feature discretization to any numerical features that it processes. Luo additionally processes a plurality of features. When Luo’s system is finished processing a first numerical feature, and has additional numerical features to process, it would remove the processed feature from the queue, and create a second plurality of possible splits on the second feature in order to begin its numerical feature preprocessing.); 
for each split of the second plurality of possible splits: transforming the fourth feature into a fifth feature based on said each split (Figure 5, 3rd discretized feature. 4.4 Preprocessing p6, “The most simple and widely-used discretization method is equal-width discretization, i.e., to split the value range of a feature into several equal-width intervals… The basic idea is simple: instead of using a fine-tuned granularity, we discretize each numerical feature into several, rather than only one, categorical features, each with a different granularity.” Examiner Note: The 3rd discretized feature is created by the second split (2nd discretized feature) and is different from the second feature.); 
generating a second cross feature of the fifth feature and the third feature (4.2 Feature Generation, p4-5, “In AutoCross, we consider a tree-structured space T depicted in Figure 3, where each node corresponds to a feature set and the root is the original feature set F. For simplicity, in this example, we denote the crossing of two features A and B as AB, and higher-order cross features in similar ways. For a node (a feature set), its each child is constructed by adding to itself one pair-wise crossing of its own elements. The pair-wise interactions between cross features (or a cross feature and an original feature) will lead to high-order feature crossing… First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next” Figure 3, Algorithm 1. Examiner Note: Luo processes numeric features into a second feature prior to crossing, as seen in 4.4 Preprocessing. In 4.2 and Figure 3, Luo then crosses features with a plurality of unique primary features. The examiner recognizes that a unique feature A, if it were numerical, would have been processed into a second feature. Thus, Luo generates a cross feature AB that is based on a second feature and a third feature different from both the first feature and the second feature.); 
estimating a second predictive power of the second cross feature (Algorithm 1, “evaluate all candidate feature sets…” 4.2 Feature Generation, p5, “First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next.” 4.3 Feature Set Evaluation, p5, “First, we use logistic regression (LR) trained with mini-batch gradient descent to evaluate candidate feature sets, and use the corresponding performance to approximate the performance of the learning algorithm L that actually follows” Examiner Note: Luo’s performance of a feature in terms of contribution to a learning algorithm is seen as equivalent to the instant application’s predictive power.); and
splitting the fourth feature based on the second split to generate a sixth feature that is different than the first, third, and fourth features (Fig. 5, 3rd discretized feature. Examiner Note: The 3rd discretized feature are generated from the 2nd split (2nd discretized feature) and are different from the other created features.). 

Luo does not explicitly teach for a first feature that is a numeric feature, identifying a first plurality of possible splits of the first feature based on a determined minimum resolution of the first feature, a determined minimum value of the first feature, and a determined maximum value of the first feature, wherein the size of buckets for the numeric feature resulting from the first plurality of possible splits is nonuniform, adding the predictive power to a set of estimated predictive powers, and selecting a first cross feature that is associated with the highest estimated predictive power in the set of estimated predictive powers, adding the second predictive power to a second set of estimated predictive powers; selecting a third cross feature that is associated with the highest estimated predictive power in the second set of estimated predictive powers, wherein the third cross feature corresponds to a second split in the second plurality of possible splits; and.

Hong teaches for a first feature that is a numeric feature, identifying a first plurality of possible splits of the first feature based on a determined minimum resolution of the first feature, a determined minimum value of the first feature, and a determined maximum value of the first feature, wherein the size of buckets for the numeric feature resulting from the first plurality of possible splits is nonuniform (Section 8, p725, “When a numeric feature Xk is discretized, the new component distance becomes 1 if and only if there is a cut point between the pair of values, xki and xkj, and 0 otherwise…An entry in the SPANk is an interval on the value line of Xk, specified by its beginning and ending points and a weight value which is the approximate merit contribution should the interval be cut.” NFD2) Perform CM to obtain the contextual merits and SPAN. Section 8 p. 725, “The SPAN list thus obtained is used to discretize all the numeric features.” NFD5.2) Perform IC : Monitor the progress of merit increases as the number of cuts increase. Terminate IC either by manual or automatic method based on Di parameters. Return the best cuts for the chosen c.” Examiner Note: Hong’s setting of component distance based on whether the numeric feature is to be cut is seen as equivalent to determining the minimum resolution of the numeric feature (either 0 in the case of no cut, or 1 in the case of at least one cut). In Hong’s Numeric Feature Discretization algorithm, Hong determines the SPAN of each numeric feature. Per section 8, the span of a feature includes both the minimum and maximum of that feature. As such, determining the SPAN of a feature is seen as equivalent to determining the minimum and maximum value of that feature.);
adding the predictive power to a set of estimated predictive powers (4, A New Contextual Merit Function for Features p 720, “In general, one feature does not distinguish classes by itself; it does so in combination with other features. Therefore, it is desirable to obtain the feature’s correlation to the class in the context of other features. We seek a merit function that captures this contextual correlation implicitly, since enumerating all possible contexts is impractical.” p,. 721 “We give the contextual merit algorithm to compute M = (m1, m2, ..., mNf) according to (5) here” p. 722, “The first example case is an EXOR(4, 10, 1,000). The CM algorithm produces the following merit values for the first four base variables and the remaining 10 random variables (delineated by a “i”). Since we are interested in the relative importance, the values shown are decimal shifted and rounded so that the maximum is a three-digit number: M = (184 194 176 176 i 89 87 93 90 89 87 89 92 93 87).” Examiner Note: Luo teaches crossing features, evaluating those crossed features by their performance, and selecting the crossed feature with the highest performance. However, Luo stores only the best performing crossed feature, replacing the stored best feature when it determines a new crossed feature to have a greater performance, rather than storing all the performance scores and comparing the stored scores later. Hong also evaluates potential features, and stores the scores of those features in a set M upon evaluation. When Hong is applied to Luo, the resulting system would evaluate crossed features based on their predictive power, and add those estimated powers to a set.);
selecting a first cross feature that is associated with the highest estimated predictive power in the set of estimated predictive powers, wherein the first cross feature corresponds to a first split in the first plurality of possible splits (4, A New Contextual Merit Function for Features p 720, “In general, one feature does not distinguish classes by itself; it does so in combination with other features. Therefore, it is desirable to obtain the feature’s correlation to the class in the context of other features. We seek a merit function that captures this contextual correlation implicitly, since enumerating all possible contexts is impractical.” p,. 721 “We give the contextual merit algorithm to compute M = (m1, m2, ..., mNf) according to (5) here” p. 722, “The first example case is an EXOR(4, 10, 1,000). The CM algorithm produces the following merit values for the first four base variables and the remaining 10 random variables (delineated by a “i”). Since we are interested in the relative importance, the values shown are decimal shifted and rounded so that the maximum is a three-digit number: M = (184 194 176 176 i 89 87 93 90 89 87 89 92 93 87).” Examiner Note: Luo teaches crossing features, evaluating those crossed features by their performance, and selecting the crossed feature with the highest performance (see Fig. 4, selected features, 4.3 “After a candidate is selected to replace the current solution S∗ (Step 6, Algorithm 1), we train an LR model with the new S∗, evaluate its performance…”). However, Luo stores only the best performing crossed feature, replacing the stored best feature when it determines a new crossed feature to have a greater performance, rather than storing all the performance scores and comparing the stored scores later (see Luo Algorithm 1, “evaluate all candidate feature sets…” 4.2 Feature Generation, p5, “First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next.” 4.3 Feature Set Evaluation, p5, “First, we use logistic regression (LR) trained with mini-batch gradient descent to evaluate candidate feature sets, and use the corresponding performance to approximate the performance of the learning algorithm L that actually follows”). Hong also evaluates potential features, and stores the scores of those features in a set M upon evaluation. When Hong is applied to Luo, the resulting system would evaluate crossed features based on their predictive power, and add those estimated powers to a set. The system would then select the crossed feature associated with the highest performance. When a numerical feature is discretized according to Luo 4.4 Preprocessing as described above, the cross feature created by a first split of the feature according to a plurality of possible splits corresponds to that first split (See Luo Algorithm 1, “return S*”).);
adding the second predictive power to a second set of estimated predictive powers (4, A New Contextual Merit Function for Features p 720, “In general, one feature does not distinguish classes by itself; it does so in combination with other features. Therefore, it is desirable to obtain the feature’s correlation to the class in the context of other features. We seek a merit function that captures this contextual correlation implicitly, since enumerating all possible contexts is impractical.” p,. 721 “We give the contextual merit algorithm to compute M = (m1, m2, ..., mNf) according to (5) here” p. 722, “The first example case is an EXOR(4, 10, 1,000). The CM algorithm produces the following merit values for the first four base variables and the remaining 10 random variables (delineated by a “i”). Since we are interested in the relative importance, the values shown are decimal shifted and rounded so that the maximum is a three-digit number: M = (184 194 176 176 i 89 87 93 90 89 87 89 92 93 87).” Examiner Note: Luo teaches crossing features, evaluating those crossed features by their performance, and selecting the crossed feature with the highest performance. However, Luo stores only the best performing crossed feature, replacing the stored best feature when it determines a new crossed feature to have a greater performance, rather than storing all the performance scores and comparing the stored scores later. Hong also evaluates potential features, and stores the scores of those features in a set M upon evaluation. When Hong is applied to Luo, the resulting system would evaluate crossed features based on their predictive power, and add those estimated powers to a set.); and
selecting a third cross feature that is associated with the highest estimated predictive power in the second set of estimated predictive powers, wherein the third cross feature corresponds to a second split in the second plurality of possible splits (4, A New Contextual Merit Function for Features p 720, “In general, one feature does not distinguish classes by itself; it does so in combination with other features. Therefore, it is desirable to obtain the feature’s correlation to the class in the context of other features. We seek a merit function that captures this contextual correlation implicitly, since enumerating all possible contexts is impractical.” p,. 721 “We give the contextual merit algorithm to compute M = (m1, m2, ..., mNf) according to (5) here” p. 722, “The first example case is an EXOR(4, 10, 1,000). The CM algorithm produces the following merit values for the first four base variables and the remaining 10 random variables (delineated by a “i”). Since we are interested in the relative importance, the values shown are decimal shifted and rounded so that the maximum is a three-digit number: M = (184 194 176 176 i 89 87 93 90 89 87 89 92 93 87).” Examiner Note: Luo teaches crossing features, evaluating those crossed features by their performance, and selecting the crossed feature with the highest performance (see Fig. 4, selected features, 4.3 “After a candidate is selected to replace the current solution S∗ (Step 6, Algorithm 1), we train an LR model with the new S∗, evaluate its performance…”). However, Luo stores only the best performing crossed feature, replacing the stored best feature when it determines a new crossed feature to have a greater performance, rather than storing all the performance scores and comparing the stored scores later (see Luo Algorithm 1, “evaluate all candidate feature sets…” 4.2 Feature Generation, p5, “First we generate all children nodes of the root, evaluate their corresponding feature sets and choose the best performing one to visit next.” 4.3 Feature Set Evaluation, p5, “First, we use logistic regression (LR) trained with mini-batch gradient descent to evaluate candidate feature sets, and use the corresponding performance to approximate the performance of the learning algorithm L that actually follows”). Hong also evaluates potential features, and stores the scores of those features in a set M upon evaluation. When Hong is applied to Luo, the resulting system would evaluate crossed features based on their predictive power, and add those estimated powers to a set. The system would then select the crossed feature associated with the highest performance. When a numerical feature is discretized according to Luo 4.4 Preprocessing as described above, the cross feature created by a second split of the feature according to a plurality of possible splits corresponds to that second split (See Luo Algorithm 1, “return S*”).).

Luo and Hong are analogous art because they are both directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature merit system. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy of the system, which can be accomplished through selection of high merit features (Hong, Application Experience, p. 728, “The contextual feature analysis and discretization algorithms described in this paper are incorporated into our data abstraction system called RAMP… For these real cases, the resultant accuracy of the RAMP generated rules were either comparable or higher than those obtained from SWAP1, C4.5, or CART, some significantly.”)

As per claim 9, the combination of Luo and Hong thus far teaches The method of claim 1.

Luo teaches further comprising: incrementing a count, wherein the count has a particular value prior to selecting the first cross feature (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the number is reached” Examiner Note: Luo teaches a count of cross features, where the program termination happens at a given threshold number of cross features. The count would inherently begin at 0 when no cross features have been generated.);
after selecting the first cross feature, determining whether the count equals a threshold value (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the number is reached” Examiner Note: Luo teaches a count of cross features, where the program termination happens at a given threshold number of cross features. The count would inherently increment based on a cross feature being generated.);
in response to determining that the count does not equal the threshold value, transforming the fourth feature into the fifth feature (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the number is reached” Examiner Note: Luo teaches a count of cross features, where the program termination happens at a given threshold number of cross features. The process would inherently continue so long as the threshold has not been reached.);
incrementing the count (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the number is reached” Examiner Note: Luo teaches a count of cross features, where the program termination happens at a given threshold number of cross features. The process would inherently continue so long as the threshold has not been reached.);
after selecting the third cross feature, determining whether the count equals the threshold value (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the number is reached” Examiner Note: Luo teaches a count of cross features, where the program termination happens at a given threshold number of cross features. The process would inherently continue so long as the threshold has not been reached.);
in response to determining that the count equals the threshold value, using the third cross feature as a feature when training a model (4.5 Termination, p. 6 “3) maximal feature number: the user can give a maximal cross feature number so that AutoCross stops when the number is reached” 4.3 Feature Set Evaluation, “A vital step in Algorithm 1 is to evaluate the performance of candidate feature sets (Step 4). Here, the performance of a candidate set S is expressed as E (L(Dt r,S), Dvld ,S) (see Problem (4)), denoted as E(S) for short. To directly estimate it, we need to learn a model with algorithm L on the training set represented by S and evaluate its performance on the validation set. Though highly accurate, direct evaluation for feature sets is often rather expensive. In real-world business scenarios, training a model to convergence may take great computational resource” Examiner Note: Luo teaches a count of cross features, where the program termination happens at a given threshold number of cross features. The process is stopped upon the count reaching the threshold limit, and would return the set of features including the third cross feature for use in training a model).

As per claim 10, the combination of Luo and Hong thus far teaches The method of claim 1.
Luo teaches wherein the first feature is a time-based feature and the second feature is a categorical feature (Section 2, Motivation, “While most early works of automatic feature generation focus on second-order interactions of original features [5, 6, 20, 22, 37], trends have appeared to consider higher-order (i.e., with order higher than two) interactions to make data more informative and discriminative [2, 27, 35, 44]. High-order cross features, just like other high-order interactions, can further improve the quality of data and increase predictive power of learning algorithms. For example, a third-order cross feature ‘item ⊗ time ⊗ region’ can be a strong feature to recommend regionally preferred food during certain festivals”).

Claim 11 is an article of manufacture claim corresponding to method claim 1. Claim 11 requires One or more storage media storing instructions which, when executed by one or more processors, cause (Abstract, “Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross.” Figure 2.). Claim 11 is rejected for the same reasons as claim 1.

Claim 18 is an article of manufacture claim corresponding to method claim 8. Claim 18 is rejected for the same reasons as claim 8.

Claim 19 is an article of manufacture claim corresponding to method claim 9. Claim 19 is rejected for the same reasons as claim 9.

Claim 20 is an article of manufacture claim corresponding to method claim 10. Claim 20 is rejected for the same reasons as claim 10.



Claims 2, 4, 5, 12, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over “AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications” to Luo et al (hereinafter, Luo) in view of “Use of Contextual Information for Feature Ranking and Discretization” to Hong (hereinafter, Hong), further in view of CN 109101562 A to Zhou (hereinafter, Zhou).

As per claim 2, the combination of Luo and Hong thus far teaches The method of claim 1. 

Luo and Hong teach the splitting and discretization of numerical features, but do not explicitly teach wherein the fourth feature comprises one more bucket than the first feature.
Zhou teaches wherein the fourth feature comprises one more bucket than the first feature (Page 2, “The classification partition of the first specified number corresponding to the first feature, the plurality of selected sample into first sample of first specified number; screening to meet the target of the first pre-set condition, the first sample from each of said first sample, wherein the target first sample is one or more; a plurality of features in the target first sample comprises obtaining the influence information of the first sample of the target maximum second feature, the second feature and the first feature is different; The classification partition of the second specified number of the second character corresponding to the first sample of the target into a second sample of the second specified number;” Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to classification zone of the first characteristic, the first specified number and classification sub-number of the first feature. such first characteristic is gender, gender including male and female two sorting partition, then the first specified number is two, the selected sample is divided into two first samples, one of which is a first sample of the female, and the other one is a male first sample. and for example, first feature is age, age is pre-dispersed to (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five sorting partition, then the first specified number is five, the preselected sample into five first sample.” Page 5-6, “This embodiment takes the selected one target first sample as an example, interpret the dividing process of the target first sample, the other second divided echelons of other target first sample as the same processing. to female first sample as an example in this embodiment, according to the decision tree method to find the target first sample influence the maximum information amount of the second characteristic, such as age. gender due to "sex" as the first feature after dividing the sample to obtain the first sample in the same target are the same, ordering the characteristic importance, "gender" this feature will not be sorted, for example, this characteristic in descending order, the first bit is "age", the second characteristic is "age".” Page 6, “For example, the present embodiment of the discrete age is (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five intervals, the target first sample is correspondingly divided into five second sample respectively corresponding to five sections. equal to the further refining of the preselected sample, to find target people purchase rate is high.” Examiner Note: Luo teaches splitting a feature into a plurality of subfeatures, but avoids increasing the number of splits in order to increase computational efficiency (4.4 Preprocessing “In order to avoid the dramatic increase in feature number caused by discretization, once these features are generated, we use fieldwise LR (without considering bsum) to evaluate them and keep only the best half.”). Zhou teaches splitting a population feature twice, wherein the first split can have fewer buckets than the second (e.g., the first split is gender with 2 buckets, and the 2nd is age with 5). When Zhou is applied to Luo, the resulting system would split features into a variable quantity of buckets, rather than reducing the number of buckets in each split. The system would be clearly capable of forming a fourth feature with one more bucket than the first feature.).

Luo, Hong, and Zhou are analogous art because they are directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature merit system and Zhou’s bucketing system. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy and efficiency of the system, which can be accomplished through selection of features most pertinent to the model being trained (Zhou, page 1, “existing client data are present in the form of large data, find out the special data to be in the large population needs or is difficult. but the current applied from the large database screening to meet the needs of target population so as to more directly and effectively aiming at target population corresponding work, it not only can improve the working efficiency, and can make the working target has more pertinence, more obvious working effect. Therefore, accurately find the target population has actual application value in the large data.”)

As per claim 4, the combination of Luo and Hong thus far teaches The method of claim 1. 

Luo and Hong teach the splitting and discretization of numerical features, but do not explicitly teach the generation of buckets split from a feature.

Zhou teaches wherein transforming the first feature into the second feature based on the said each split comprises: identifying a particular bucket of the first feature that is to be split based on said each split (Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to classification zone of the first characteristic, the first specified number and classification sub-number of the first feature. such first characteristic is gender, gender including male and female two sorting partition, then the first specified number is two, the selected sample is divided into two first samples, one of which is a first sample of the female, and the other one is a male first sample. and for example, first feature is age, age is pre-dispersed to (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five sorting partition, then the first specified number is five, the preselected sample into five first sample.” Page 6, “For example, the present embodiment of the discrete age is (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five intervals, the target first sample is correspondingly divided into five second sample respectively corresponding to five sections. equal to the further refining of the preselected sample, to find target people purchase rate is high.” Examiner Note: Luo teaches discretization of a numerical feature, but does not explicitly state their bucketing procedure in that process. Zhou explicitly teaches splitting a population feature into a plurality of buckets, where the boundary of one bucket is shared with another bucket. In this case, the first feature is a population of people, and the particular bucket being split is the age of those people.);
generating a first bucket and a second bucket based on the particular bucket; wherein a first boundary of the first bucket is the same as a first boundary of the particular bucket (Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to classification zone of the first characteristic, the first specified number and classification sub-number of the first feature. such first characteristic is gender, gender including male and female two sorting partition, then the first specified number is two, the selected sample is divided into two first samples, one of which is a first sample of the female, and the other one is a male first sample. and for example, first feature is age, age is pre-dispersed to (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five sorting partition, then the first specified number is five, the preselected sample into five first sample.” Page 6, “For example, the present embodiment of the discrete age is (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five intervals, the target first sample is correspondingly divided into five second sample respectively corresponding to five sections. equal to the further refining of the preselected sample, to find target people purchase rate is high.” Examiner Note: The first bucket’s first boundary is age 0, and the first boundary of the total age bucket is also 0.);
wherein a second boundary of the first bucket is based on said each split (Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to classification zone of the first characteristic, the first specified number and classification sub-number of the first feature. such first characteristic is gender, gender including male and female two sorting partition, then the first specified number is two, the selected sample is divided into two first samples, one of which is a first sample of the female, and the other one is a male first sample. and for example, first feature is age, age is pre-dispersed to (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five sorting partition, then the first specified number is five, the preselected sample into five first sample.” Page 6, “For example, the present embodiment of the discrete age is (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five intervals, the target first sample is correspondingly divided into five second sample respectively corresponding to five sections. equal to the further refining of the preselected sample, to find target people purchase rate is high.” Examiner Note: The particular bucket contains values from 0 to 100, and is split 5 times. Thus, the second boundary of the first bucket is placed at 20, in order to create 5 buckets.);
wherein a first boundary of the second bucket is based on said each split (Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to classification zone of the first characteristic, the first specified number and classification sub-number of the first feature. such first characteristic is gender, gender including male and female two sorting partition, then the first specified number is two, the selected sample is divided into two first samples, one of which is a first sample of the female, and the other one is a male first sample. and for example, first feature is age, age is pre-dispersed to (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five sorting partition, then the first specified number is five, the preselected sample into five first sample.” Page 6, “For example, the present embodiment of the discrete age is (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five intervals, the target first sample is correspondingly divided into five second sample respectively corresponding to five sections. equal to the further refining of the preselected sample, to find target people purchase rate is high.” Examiner Note: The particular bucket contains values from 0 to 100, and is split 5 times. Thus, the first boundary of the second bucket is placed at 20, in order to create 5 buckets.);
wherein a second boundary of the second bucket is the same as a second boundary of the particular bucket (Page 5, “For example, each feature is pre-divided classification partition of appointed number, so the first characteristic also corresponding first classification sub-specified number exists, then the preselected sample dividing the preselected sample according to classification zone of the first characteristic, the first specified number and classification sub-number of the first feature. such first characteristic is gender, gender including male and female two sorting partition, then the first specified number is two, the selected sample is divided into two first samples, one of which is a first sample of the female, and the other one is a male first sample. and for example, first feature is age, age is pre-dispersed to (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five sorting partition, then the first specified number is five, the preselected sample into five first sample.” Page 6, “For example, the present embodiment of the discrete age is (0, 20), (20, 40), (40, 60), (60, 80), [80,100] five intervals, the target first sample is correspondingly divided into five second sample respectively corresponding to five sections. equal to the further refining of the preselected sample, to find target people purchase rate is high.” Page 7, “In this embodiment, first feature different properties of the pre-selected dividing standard sample are different, different processing method of dividing. such as category features contained only according to the dividing standard sample into the sort of number of sort determines the number of the classification areas, numerical model characteristic can be discrete into a plurality of continuous distributions of the data section according to the need, then the samples divided according to a plurality of data intervals, number of data section determines the number of the classification zone.” Examiner Note: In the example cited by Zhou, 5 buckets are used when sorting age. However, [citation 1] indicates that the number of buckets is variable. The examiner recognizes that if Zhou had used 2 buckets when sorting the population by age, where the population ages form the set (0, 100), the two buckets created would have been (0, 50) and (50, 100). Thus, the second bucket would have shared a second boundary (100) with the total population age distribution.).

Luo, Hong, and Zhou are analogous art because they are directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature merit system and Zhou’s bucketing system. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy and efficiency of the system, which can be accomplished through selection of features most pertinent to the model being trained (Zhou, page 1, “existing client data are present in the form of large data, find out the special data to be in the large population needs or is difficult. but the current applied from the large database screening to meet the needs of target population so as to more directly and effectively aiming at target population corresponding work, it not only can improve the working efficiency, and can make the working target has more pertinence, more obvious working effect. Therefore, accurately find the target population has actual application value in the large data.”)

As per claim 5, the combination of Luo and Hong thus far teaches The method of Claim 1.

Luo and Hong teach estimation of a cross features performance, but do not specifically estimate the predictive power of the cross feature comprising calculating an entropy of a label and an entropy of the label given the cross feature.

Zhou teaches wherein estimating the predictive power of the cross feature comprises calculating an entropy of a label and an entropy of the label given the cross feature (Page 12, “Each feature of the embodiment respectively the effect value of the overall amount, obtained by the information gain algorithm, by calculating each feature individually after adding calculating process, affecting the magnitude of the overall entropy to obtain the influence value. information gain algorithm is calculated as follows: g (D, A) = H (D) H (D | A), wherein g (D, A) represents A feature affecting the magnitude of the overall entropy, H (D) represents the entropy of the preselected sample, H (D Ι A) represents entropy characteristic of sample A after dividing. The other embodiment of the application can be obtained by information gain respectively the influence value of each feature of the general information, the information gain by introducing penalty parameter correction, to reduce the influence of the small entropy of the small sample, namely, the information gain is penalty parameter * information gain.” Examiner Note: Luo teaches estimating the predictive power of a cross feature, but does not calculate the entropy of the feature or of the object when doing so. Zhou judges the value of a potential feature based on the entropy of both a label, and a label given the addition of a feature. When Zhou is applied to Luo, the resulting system would estimate performance of a cross feature based on the entropy of the label being predicted, and the entropy of the label being predicted using the cross feature.).

Luo, Hong, and Zhou are analogous art because they are directed to feature generation. Therefore, it would have been obvious to one of ordinary art before the effective filing date of the claimed invention to combine Luo’s feature discretization and crossing system with Hong’s feature merit system and Zhou’s entropy calculation. The combination would have been obvious to one of ordinary skill in the art because they would have been motivated to increase accuracy and efficiency of the system, which can be accomplished through selection of features most pertinent to the model being trained (Zhou, page 1, “existing client data are present in the form of large data, find out the special data to be in the large population needs or is difficult. but the current applied from the large database screening to meet the needs of target population so as to more directly and effectively aiming at target population corresponding work, it not only can improve the working efficiency, and can make the working target has more pertinence, more obvious working effect. Therefore, accurately find the target population has actual application value in the large data.”)

Claim 12 is an article manufacture claim corresponding to method claim 2. Claim 12 is rejected for the same reasons as claim 2.
Claim 14 is an article manufacture claim corresponding to method claim 4. Claim 14 is rejected for the same reasons as claim 4.
Claim 15 is an article manufacture claim corresponding to method claim 5. Claim 15 is rejected for the same reasons as claim 5.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 20060059112 A1 to Cheng et al is considered pertinent due to the feature value evaluation disclosed.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL G SMITH whose telephone number is (571)272-9730. The examiner can normally be reached on Monday-Friday from 9:30 A.M. to 6:00 P.M. EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo, can be reached at telephone number 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (tollfree). 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Respectfully Submitted,
/P.G.S./Examiner, Art Unit 2126     
                                                                                                                                                                                                   /NICHOLAS KLICOS/Primary Examiner, Art Unit 2145