DETAILED ACTION
Response to Arguments
Applicant' s arguments, with respect to §112(b) have been fully considered and are persuasive. Accordingly, they have been withdrawn. 
Applicant’s arguments with respect to claims 1, 10 and 16 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 1-7 and 9-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hueter et al. US 2016/0019587 Al (“Hueter”) in view of Ma, Haiying. "A Study on Customer Segmentation for E-Commerce Using the Generalized Association Rules and Decision Tree." American Journal of Industrial and Business Management 5.12 (2015)(“Ma”) and further in view of Zhou et al. "Pattern based sequence classification." IEEE Transactions on knowledge and Data Engineering 28.5 (2015): 1285-1298(“Zhou”). 
Regarding claim 1, Hueter teaches a computer-implemented method for determining user segments created by a predictive model based on user behavioral data, the method comprising: 
receiving, by a computing device, training data and user input, the training data associated with training a predictive model and comprising a plurality of instances, each instance associated with a user interaction within a computer network and comprising a plurality of attributes and an outcome associated with the user interaction, and an input defining an outcome of interest(Hueter paras. 0047-0051, “FIG. 6 shows the use of the invention in a system that selects subjects to whom to recommend a specific item. The application using the recommendation service makes a Service Customer Request to the system. The request includes the attributes that are available and relevant to the request, which include but are not limited to information about the page being viewed, including category, search result, or specific item being viewed; information about the visitor, including age, gender, income, number of children, marital status, income, lifetime value, or other attributes; and information about the nature of the subject's visit to the site, including location (latitude, longitude, altitude, state, country, city, postal code, or other location information), time-of-day (adjusted for location), type of device, type of browser, connection speed, referring URL, search engine keyword or other attributes of the visit. The context attributes are processed through the previously trained segmentation model for the item of interest…The segmentation model returns a score for each possible available subject, whereby the scores indicate the relative probabilities of the subjects transacting the item.” Hueter teaches The application using the recommendation service makes a Service Customer Request to the system. The request includes the attributes that are available and relevant to the request (i.e. receiving, by a computing device, training data and user input, the training data associated with training a predictive model) which include but are not limited to information about the page being viewed, including category, search result, or specific item being viewed; information about the visitor, including age, gender, income, number of children, marital status, income, lifetime value, or other attributes; and information about the nature of the subject's visit to the site, including location (latitude, longitude, altitude, state, country, city, postal code, or other location information), time-of-day (adjusted for location), type of device, type of browser, connection speed, referring URL, search engine keyword or other attributes of the visit (i.e. and comprising a plurality of instances, each instance associated with a user interaction within a computer network and comprising a plurality of attributes) returns a score for each possible available subject, whereby the scores indicate the relative probabilities of the subjects transacting the item (i.e. and an outcome associated with the user interaction) selects subjects to whom to recommend a specific item (i.e. and an input defining an outcome of interest)); 
generating, by the computing device, a set of conditions from the training data, each condition comprising an attribute and a range of values for the attribute(Hueter para. 0051-0064,  “The subject are ranked by their combined scores and then filtered according to any specified business rules, which may include rules for pricing, category matching, inventory, or other merchandising goals. Business rules may be based on any attributes of the context, including subject attributes and content metadata.” Hueter teaches The subject are ranked by their combined scores and then filtered according to any specified business rules (i.e. generating, by the computing device, a set of conditions from the training data) Business rules may be based on any attributes of the context, including subject attributes and content metadata (i.e. each condition comprising an attribute and a range of values for the attribute)); and presenting, by the computing device, the user segment to the operator at an interface(Hueter para. 0098, “FIG. 10 shows the parameter selection process based on the first level of candidate segments. This user interface would allow an operator, for example a merchandiser or marketing manager, to get an idea of which variables are predictive of subjects' intents to transact. The operator would then select which variables to include in the segmentation model.” Hueter teaches FIG. 10 shows the parameter selection process based on the first level of candidate segments. This user interface (i.e. and presenting, by the computing device, the user segment to the operator at an interface)).  
Hueter does not teach: determining, by the computing device, a set of relevant conditions from the set of conditions based on relevance to the outcome of interest according to the predictive model, wherein determining the set of relevant conditions comprises:  generating a first condition related to a first attribute; causing the predictive model to compute from a first input data instance corresponding to the first conditions, predicted outcomes for users represented in the first input data instance.
However Ma teaches: determining, by the computing device, a set of relevant conditions from the set of conditions based on relevance to the outcome of interest according to the predictive model (Ma, pgs., 815-816, “Step 1: The first stage of the model is to select the variety variable of the purchased commodities from all variables, forming a data item set. Each data within the set corresponds to one type of commodity, constituting a set of objects… Step 6: Extract rules from the pruned decision tree.” Ma teaches Extract rules from the pruned decision tree (i.e. determining, by the computing device, a set of relevant conditions from the set of conditions based on relevance to the outcome of interest according to the predictive model)), 
wherein determining the set of relevant conditions comprises: generating a first condition related to a first attribute (Ma, pg. 815, “Step 1: The first stage of the model is to select the variety variable of the purchased commodities from all variables, forming a data item set. Each data within the set corresponds to one type of commodity, constituting a set of objects. Calculate the degree of support for all possible rules: The support of rule X => Y in the data set is the ratio of numbers between data sets with X, Y and all arrangements.” Ma teaches calculate the degree of support for all possible rules: The support of rule X => Y in the data set is the ratio of numbers between data sets with X, Y and all arrangements. (i.e. wherein determining the set of relevant conditions comprises: generating a first condition related to a first attribute)), 
 causing the predictive model to compute from a first input data instance corresponding to the first conditions, predicted outcomes for users represented in the first input data instance (Ma, pgs., 815-817, “Step 4: On the second stage of the model, decision tree C5.0 can be used to add up and induce the features obtained out of the association rules… Results shows the outputs of analysis based on the above three outputs generated out of the integrated model. The accuracy ratios of the three rule sets are…85.38%, 93.57%, 82.46% with the decision tree C5.0 model, shown in Figure 3.” Ma teaches On the second stage of the model, decision tree C5.0 can be used to add up and induce the features obtained out of the association rules (i.e. causing the predictive model to compute from a first input data instance corresponding to the first conditions, predicted outcomes for users represented in the first input data instance))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hueter in view of Ma the motivation to do so would be to combine association rules from big data along with rule-based machine learning to produce better customer segmentation groups (Ma, pgs. 813-814, “With the continuous development of e-commerce, the traditional technique of customer segmentation has been unable to cope with the massive and complex customer data. Based on the data mining technique, the new analyzing technique provides new solutions to the massive data of complex customer segmentation. Through collecting and classifying customer information, the new technique intends to find out customer groups with different attribute features: the demand characteristics of the overall customer internal, the buying behavior, the browsing characteristics and etc. Then it subdivides customers, helps e-commerce businesses understand their customers, provides clustering customer groups with more suitable, comprehensive and customized service, selects the most exploitable target customer groups and finds out the most potential customers.”).
 Hueter does not teach: determining a first relevance of the first condition by comparing the outcome of interest to the predicted outcomes output by the predictive model, including the first condition in the set of relevant conditions, based on the first relevance; generating a second condition related to a second attribute, causing the predictive model to compute, from a second input data instance corresponding to the second condition, second predicted outcomes for second users represented in the second input data instance, determining a second relevance of the second condition by comparing the outcome of interest to the second predicted outcomes output by the predictive model; and excluding the second condition from the set of relevant conditions, based on the second relevance, generating, by the computing device, a user segment associated with the set of relevant conditions including the first condition and excluding the second condition. 
However, Zhou teaches: determining a first relevance of the first condition by comparing the outcome of interest to the predicted outcomes output by the predictive model (Zhou, pg. 1288, see also Algorithm 1 e.g. “We say that a rule correctly classifies or covers a data object in SDB if the rule matches the sequence part of the data object and the rule’s consequent equals the class label part of the data object.”), 
including the first condition in the set of relevant conditions, based on the first relevance (Zhou, pg. 1290, e.g. see also table 1 and table 2, “To illustrate how the algorithms work, consider the training dataset given in Table 1 again. Assume min_sup =min_int =0.6, max_size=3 and min_conf=0.5. After finding frequent patterns in S1 and S2…we get the confident rules sorted using Definition 1 for itemset rules and sequence rules respectively, as shown in Table 2.

    PNG
    media_image1.png
    470
    656
    media_image1.png
    Greyscale


Assuming we use a database coverage threshold                         
                            δ
                        
                    =1 only the rules shown in bold would survive the pruning stage.” Zhou teaches: As shown by Table 2, Itemset Rule of                          
                            a
                            b
                            ⇒
                            
                                
                                    L
                                
                                
                                    1
                                
                            
                        
                     with conf(r) 0.5 represents including the first condition in the set of relevant conditions, based on the first relevance); 
generating a second condition related to a second attribute(Zhou, pg. 1290, e.g. see also table 1 and table 2, “To illustrate how the algorithms work, consider the training dataset given in Table 1 again. Assume min_sup =min_int =0.6, max_size=3 and min_conf=0.5. After finding frequent patterns in S1 and S2…we get the confident rules sorted using Definition 1 for itemset rules and sequence rules respectively, as shown in Table 2.

    PNG
    media_image1.png
    470
    656
    media_image1.png
    Greyscale

Assuming we use a database coverage threshold                         
                            δ
                        
                    =1 only the rules shown in bold would survive the pruning stage.” Zhou teaches: As shown by Table 2, Itemset Rule of                         
                            c
                            b
                            d
                            ⇒
                            
                                
                                    L
                                
                                
                                    2
                                
                            
                        
                     with conf(r) 0.5 represents generating a second condition related to a second attribute);
causing the predictive model to compute, from a second input data instance corresponding to the second condition, second predicted outcomes for second users represented in the second input data instance(Zhou, pgs. 1291, see also Algorithm 8, “Returning to our running example, assume we are given a new data object                         
                            (
                            
                                
                                    s
                                
                                
                                    9
                                
                            
                            ,
                             
                            
                                
                                    L
                                
                                
                                    2
                                
                            
                            )
                        
                     [i.e. from a second input data instance corresponding to the second condition]with                         
                            
                                
                                    s
                                
                                
                                    9
                                
                            
                            =
                            
                                
                                    a
                                    ,
                                     
                                    x
                                    ,
                                     
                                    b
                                    ,
                                     
                                    y
                                    ,
                                     
                                    c
                                    ,
                                     
                                    d
                                    ,
                                     
                                    z
                                
                            
                        
                    . If we try using the rules discovered in Example 5, we can see that it is not easy to choose the correct classification rule, as all the remaining rules match                        
                             
                            
                                
                                    s
                                
                                
                                    9
                                
                            
                        
                     … [u]sing SCII_MA [classifier][i.e. causing the predictive model to compute]…we would re-rank the rules taking the cohesion of the antecedent in                        
                            
                                
                                     
                                    s
                                
                                
                                    9
                                
                            
                        
                     into account. In the end, rule                         
                            c
                            d
                            ⇒
                            
                                
                                    L
                                
                                
                                    2
                                
                            
                        
                     is chosen, as                         
                            C
                            
                                
                                    c
                                    d
                                    ,
                                     
                                    
                                        
                                            s
                                        
                                        
                                            9
                                        
                                    
                                
                            
                            =
                            1
                        
                    , while                         
                            C
                            
                                
                                    a
                                    b
                                    ,
                                     
                                    
                                        
                                            s
                                        
                                        
                                            9
                                        
                                    
                                
                            
                            =
                            
                                
                                    2
                                
                                
                                    3
                                
                            
                        
                    [i.e. second predicted outcomes for second users represented in the second input data instance ] We see that SCII_MA classifies the new sequence correctly.” ); 
determining a second relevance of the second condition by comparing the outcome of interest to the second predicted outcomes output by the predictive model(Zhou, pg. 1288, see also Algorithm 1 e.g. “We say that a rule correctly classifies or covers a data object in SDB if the rule matches the sequence part of the data object and the rule’s consequent equals the class label part of the data object.”); and 
excluding the second condition from the set of relevant conditions, based on the second relevance(Zhou, pg. 1290, e.g. see also table 1 and table 2, “To illustrate how the algorithms work, consider the training dataset given in Table 1 again. Assume min_sup =min_int =0.6, max_size=3 and min_conf=0.5. After finding frequent patterns in S1 and S2…we get the confident rules sorted using Definition 1 for itemset rules and sequence rules respectively, as shown in Table 2.

    PNG
    media_image2.png
    485
    751
    media_image2.png
    Greyscale

Assuming we use a database coverage threshold                         
                            δ
                        
                    =1 only the rules shown in bold would survive the pruning stage.” Zhou teaches: As shown by Table 2, Itemset Rule of                         
                            c
                            b
                            d
                            ⇒
                            
                                
                                    L
                                
                                
                                    2
                                
                            
                        
                     with conf(r) 0.5 would be pruned/removed since only the rules in bold would not be elimination this represents the limitation of excluding the second condition from the set of relevant conditions, based on the second relevance);
 generating, by the computing device, a user segment associated with the set of relevant conditions including the first condition and excluding the second condition(Zhou, pg. 1290, e.g. see also table 1 and table 2, “To illustrate how the algorithms work, consider the training dataset given in Table 1 again. Assume min_sup =min_int =0.6, max_size=3 and min_conf=0.5. After finding frequent patterns in S1 and S2…we get the confident rules sorted using Definition 1 for itemset rules and sequence rules respectively, as shown in Table 2.


    PNG
    media_image2.png
    485
    751
    media_image2.png
    Greyscale

Assuming we use a database coverage threshold                         
                            δ
                        
                    =1 only the rules shown in bold would survive the pruning stage.” Zhou teaches: As shown by Table 2, if the sequence  of data objects is                         
                            
                                
                                    a
                                    ,
                                    b
                                
                            
                        
                     then the class label will be                         
                            
                                
                                    L
                                
                                
                                    1
                                
                            
                        
                     and if the sequence of data object is                         
                            
                                
                                    c
                                    ,
                                    d
                                
                            
                        
                     then the class label will be                         
                            
                                
                                    L
                                
                                
                                    2
                                
                            
                        
                     represents generating, by the computing device, a user segment associated with the set of relevant conditions including the first condition and excluding the second condition). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hueter in view Zhou of the motivation to do so would be to apply a better relevance strategy for ranking rules and using those rules as feature vectors to be inputted into traditional machine learning models(Zhou, pg. 1286, “In addition, we now deploy a new top-k strategy for all the presented classifiers, instead of using only the highest ranked rule, as we did in our previous work. Moreover, we now propose using the discovered patterns as features in order to transform each sequence into a feature vector. We present a new feature vector representation approach by setting each feature value as the cohesion of the feature in a sequence. After this, we apply machine learning algorithms for sequence classification and find that this new feature vector representation approach outperforms the traditional presence-weighted feature vector representation approach.”).
Regarding claim 2, Hueter in view of Ma and in view Zhou teaches the method of claim 1, wherein generating the set of conditions further comprises: extracting each condition present in each instance of the training data, and aggregating each condition into the set of conditions (Hueter, para. 0057, “FIG. 7B shows an example of a transformation of attributes…The attributes employed in the invention may be considered as m-dimensional tuples ( or m-tuples) that are members of a set constructed from the Cartesian product of the sets of attributes of interest.” Huete teaches FIG. 7B shows an example of a transformation of attributes (i.e. extracting each condition present in each instance of the training data) The attributes employed in the invention may be considered as m-dimensional tuples ( or m-tuples) that are members of a set (i.e. and aggregating each condition into the set of conditions)).  
Regarding claim 3, Hueter in view of M and in view Zhou teaches the method of claim 1, further comprising determining a relevance of the user segment based on a predicted outcome from the predictive model given the set of relevant conditions associated with the user segment(Hueter, paras. 0073-0076, “Calculate the density factor d=r/s, whereby r=(number of items of interest in peak sequence) and s=(number of all items in peak sequence). Note that d is number between 0 and 1…A better significance calculation is attained by replacing the formula in step 7 above with the following:                         
                            R
                            =
                            
                                
                                    
                                        
                                            r
                                            -
                                            
                                                
                                                    r
                                                
                                                
                                                    a
                                                    v
                                                    g
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        r
                                        +
                                        
                                            
                                                r
                                            
                                            
                                                a
                                                v
                                                g
                                            
                                        
                                    
                                
                            
                            >
                            T
                        
                     where                         
                            
                                
                                    r
                                
                                
                                    a
                                    v
                                    g
                                
                            
                            =
                            s
                            ⋅
                            
                                
                                    N
                                
                                
                                    p
                                
                            
                            /
                            
                                
                                    N
                                
                                
                                    t
                                    o
                                    t
                                    a
                                    l
                                
                            
                        
                      and for example T=2. The above process is repeated for all dimensions and cells.” Hueter teaches the following:                         
                            R
                            =
                            
                                
                                    
                                        
                                            r
                                            -
                                            
                                                
                                                    r
                                                
                                                
                                                    a
                                                    v
                                                    g
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        r
                                        +
                                        
                                            
                                                r
                                            
                                            
                                                a
                                                v
                                                g
                                            
                                        
                                    
                                
                            
                            >
                            T
                        
                     where                         
                            
                                
                                    r
                                
                                
                                    a
                                    v
                                    g
                                
                            
                            =
                            s
                            ⋅
                            
                                
                                    N
                                
                                
                                    p
                                
                            
                            /
                            
                                
                                    N
                                
                                
                                    t
                                    o
                                    t
                                    a
                                    l
                                
                            
                        
                      and for example T=2 (i.e. determining a relevance of the user segment) Calculate the density factor d=r/s, whereby r=(number of items of interest in peak sequence) and s=(number of all items in peak sequence). Note that d is number between 0 and 1 (i.e. based on a predicted outcome from the predictive model given the set of relevant conditions associated with the user segment)).  
Regarding claim 4, Hueter in view of Ma and in view Zhou teaches the method of claim 3, further comprising: generating a second user segment; and determining, based on the relevance of the user segment and an other relevance of the second user segment, an optimal set of user segments, the optimal set of user segments comprising at least one of the user segment and the second user segment, wherein the optimal set of user segments include most relevant user segments used by the predictive model in predicting the outcome of interest(Hueter paras. 0083-0098, “The system can compose the sequences of several items created with step 2 of the attribute analysis in paragraph…into a single sequence Dpa…and subsequently analyze the resulting sequence Dpa… FIG. 9B shows an example of the composition of presence-absence sequences from several items into a composite sequence…we may wish to exploit the advantages of considering the presence or absence of events pertaining to a set of items of interest, instead of single items… a composition of the sequences for each individual item (or a collection of subsets of items) that aims to increase the significance of the resulting sequence instead of possibly decreasing it. We can accomplish this by constructing the composition as follows:                         
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    1
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    1
                                
                            
                            +
                            
                                
                                    w
                                
                                
                                    2
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    2
                                
                            
                            +
                            
                                
                                    w
                                
                                
                                    3
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    3
                                
                            
                            +
                            …
                        
                     …The above process of choosing the signs for                         
                            
                                
                                    w
                                
                                
                                    j
                                
                            
                        
                     successively one term at a time is intended to avoid the computational cost of a global optimization algorithm ( such as simulated annealing or genetic programming) that would explore the choice of sign for each term independently, to arrive at the signs that maximize significance or variance in the sequence
Dpa….” Hueter teaches The system can compose the sequences of several items created with step 2 of the attribute analysis in paragraph…into a single sequence Dpa…and subsequently analyze the resulting sequence Dpa… FIG. 9B shows an example of the composition of presence-absence sequences from several items into a composite sequence (i.e. generating a second user segment) a composition of the sequences for each individual item (or a collection of subsets of items) that aims to increase the significance of the resulting sequence instead of possibly decreasing it. We can accomplish this by constructing the composition as follows:                         
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    1
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    1
                                
                            
                            +
                            
                                
                                    w
                                
                                
                                    2
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    2
                                
                            
                            +
                            
                                
                                    w
                                
                                
                                    3
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    3
                                
                            
                            +
                            …
                        
                     (i.e. and determining, based on the relevance of the user segment and an other relevance of the second user segment) The above process of choosing the signs for                         
                            
                                
                                    w
                                
                                
                                    j
                                
                            
                        
                     successively one term at a time is intended to avoid the computational cost of a global optimization algorithm ( such as simulated annealing or genetic programming) that would explore the choice of sign for each term independently, to arrive at the signs that maximize significance or variance in the sequence (i.e. an optimal set of user segments, the optimal set of user segments comprising at least one of the user segment and second user segment, wherein the optimal set of user segments include most relevant user segments used by the predictive model in predicting the outcome of interest)).  
Regarding claim 5, Hueter in view of Ma and in view Zhou teaches the method of claim 4, further comprising: determining, by the computing device, that the second user segment is redundant compared to the user segment; and removing, by the computing device, the second user segment from the optimal set of user segments(Hueter paras. 0083-0098, “By summing until the resulting sequence changes by less than a chosen amount when a term is added, with the change measured using vector-lengths of the sequences with an appropriate norm, such as a Cartesian norm                         
                            
                                
                                    L
                                
                                
                                    p
                                
                            
                        
                     where p=2 (for example, stop when Dpa and the ith term                         
                            
                                
                                    w
                                
                                
                                    i
                                
                            
                        
                    Dpa satisfy                         
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            D
                                        
                                        
                                            p
                                            a
                                        
                                    
                                
                            
                            <
                            ϵ
                            |
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                
                            
                            |
                        
                    , for a previously chosen value of                         
                            ϵ
                        
                    )… but there may be reasons not clear from the training data set to exclude certain variables from the model, such as because it is known to the operator that a particular variable may not be readily available in the operational system or that one variable is redundant to another.” Hueter teaches By summing until the resulting sequence changes by less than a chosen amount when a term is added (i.e. determining, by the computing device, that the second user segment is redundant compared to the user segment) exclude certain variables from the model, such as because it is known to the operator that a particular variable may not be readily available in the operational system or that one variable is redundant to another (i.e. and removing, by the computing device, the second user segment from the optimal set of user segments)).  
Regarding claim 6, Hueter in view of Ma and in view Zhou teaches the method of claim 4, further comprising processing the user segment to remove redundant or overlapping conditions(Hueter paras. 0083-0098, “[B]ut there may be reasons not clear from the training data set to exclude certain variables from the model, such as because it is known to the operator that a particular variable may not be readily available in the operational system or that one variable is redundant to another.” Hueter teaches exclude certain variables from the model, such as because it is known to the operator that a particular variable may not be readily available in the operational system or that one variable is redundant to another (i.e. processing the user segment to remove redundant or overlapping conditions)).1  
Regarding claim 7, Hueter in view of Ma and in view Zhou teaches the method of claim 4, wherein determining an optimal set of user segments further comprises: creating a set of user segments, the set of user segments including the user segment and the second user segment; determining a first metric for the user segment and a second metric for the second user segment, the first metric and the second metric based on user segment precision and coverage; based on the first metric being higher than the second metric, retaining the user segment in the optimal set of user segments and removing the second user segment from the set of user segments; and providing the set of user segments as the optimal set of user segments(Hueter paras. 0083-0098, “[A] composition of the sequences for each individual item (or a collection of subsets of items) that aims to increase the significance of the resulting sequence instead of possibly decreasing it. We can accomplish this by constructing the composition as follows:                         
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    1
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    1
                                
                            
                            +
                            
                                
                                    w
                                
                                
                                    2
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    2
                                
                            
                            +
                            
                                
                                    w
                                
                                
                                    3
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    3
                                
                            
                            +
                            …
                        
                     …The arithmetic signs of the weights                         
                            
                                
                                    w
                                
                                
                                    j
                                
                            
                        
                     are chosen so that the contribution of Dpa increases the significance of the composite sequence. Several methods may be used to select these signs: We may evaluate the cumulative sum Dpa one term at a time in the order j= 1, 2, 3....choosing the sign for                         
                            
                                
                                    w
                                
                                
                                    i
                                
                            
                        
                    , at each step that results in the larger significance for Dpa after the ith term is included…By summing until the resulting sequence changes by less than a chosen amount when a term is added, with the change measured using vector-lengths of the sequences with an appropriate norm, such as a Cartesian norm                         
                            
                                
                                    L
                                
                                
                                    p
                                
                            
                        
                     where p=2 (for example, stop when Dpa and the ith term                         
                            
                                
                                    w
                                
                                
                                    i
                                
                            
                        
                    Dpa satisfy                         
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            D
                                        
                                        
                                            p
                                            a
                                        
                                    
                                
                            
                            <
                            ϵ
                            |
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                
                            
                            |
                        
                    , for a previously chosen value of                         
                            ϵ
                        
                    )… but there may be reasons not clear from the training data set to exclude certain variables from the model, such as because it is known to the operator that a particular variable may not be readily available in the operational system or that one variable is redundant to another.” Hueter teaches A composition of the sequences for each individual item (or a collection of subsets of items) that aims to increase the significance of the resulting sequence instead of possibly decreasing it. We can accomplish this by constructing the composition as follows:                         
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    1
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    1
                                
                            
                            +
                            
                                
                                    w
                                
                                
                                    2
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    2
                                
                            
                            +
                            
                                
                                    w
                                
                                
                                    3
                                
                            
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                    ,
                                     
                                    3
                                
                            
                            +
                            …
                        
                     (i.e. creating a set of user segments, the set of user segments including the user segment and the second user segment) The arithmetic signs of the weights                         
                            
                                
                                    w
                                
                                
                                    j
                                
                            
                        
                     are chosen so that the contribution of Dpa increases the significance of the composite sequence. Several methods may be used to select these signs: We may evaluate the cumulative sum Dpa one term at a time in the order j= 1, 2, 3....choosing the sign for                         
                            
                                
                                    w
                                
                                
                                    i
                                
                            
                        
                    , at each step that results in the larger significance for Dpa after the ith term is included (i.e. determining a first metric for the user segment and a second metric for the second user segment, the first metric and the second metric based on user segment precision and coverage) By summing until the resulting sequence changes by less than a chosen amount when a term is added, with the change measured using vector-lengths of the sequences with an appropriate norm, such as a Cartesian norm                         
                            
                                
                                    L
                                
                                
                                    p
                                
                            
                        
                     where p=2 (for example, stop when Dpa and the ith term                         
                            
                                
                                    w
                                
                                
                                    i
                                
                            
                        
                    Dpa satisfy                         
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            D
                                        
                                        
                                            p
                                            a
                                        
                                    
                                
                            
                            <
                            ϵ
                            |
                            
                                
                                    D
                                
                                
                                    p
                                    a
                                
                            
                            |
                        
                    , for a previously chosen value of                         
                            ϵ
                        
                    ) (i.e. based on the first metric being higher than the second metric, retaining the user segment in the optimal set of user segments) but there may be reasons not clear from the training data set to exclude certain variables from the model, such as because it is known to the operator that a particular variable may not be readily available in the operational system or that one variable is redundant to another (i.e. and removing the second user segment from the set of user segments; and providing the set of user segments as the optimal set of user segments)).  
Regarding claim 9, Hueter in view of Ma and in view Zhou teaches the method of claim 1, further comprising: identifying, by the computing device, a set of ranges for numerical values within the training data; and converting numerical data into categorical data by replacing a numerical value with a range(Hueter paras. 0057-0063, “Collectively, the set of m-dimensional attribute tuples Z may be transformed to an n-dimensional space of real-valued n-tuples Q, with a function Q=f(Z). The invention may then be applied to the data using the transformed attribute-tuples Q in place of the original attribute-tuples Z. The function f that effects the transformation can be defined so as to achieve any one of a number of useful results: The function can incorporate the mapping of categorical or binary variables to real numbers, as described above, thereby allowing software implementations to treat all attributes consistently.” Hueter teaches the set of m-dimensional attribute tuples Z may be transformed to an n-dimensional space of real-valued n-tuples Q, with a function Q=f(Z) (i.e. identifying, by the computing device, a set of ranges for numerical values within the training data) The function can incorporate the mapping of categorical or binary variables to real numbers (i.e. and converting numerical data into categorical data by replacing a numerical value with a range)).
Referring to independent claims 10 and 16, they are rejected on the same basis as
independent claim 1 since they are analogous claims.
Referring to dependent claims 11-15, they are rejected on the same basis as dependent claims 2-6 since they are analogous claims.
Referring to dependent claims 17-20, they are rejected on the same basis as dependent claims 2-5 since they are analogous claims.
 Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Hueter et al. US 2016/0019587 Al (“Hueter”) in view of Ma, Haiying. "A Study on Customer Segmentation for E-Commerce Using the Generalized Association Rules and Decision Tree." American Journal of Industrial and Business Management 5.12 (2015)(“Ma”) and in view of Zhou et al. "Pattern based sequence classification." IEEE Transactions on knowledge and Data Engineering 28.5 (2015): 1285-1298(“Zhou”) and further in view of Carvalho, Deborah R et al. "A hybrid decision tree/genetic algorithm method for data mining." Information Sciences 163.1-3 (2004)(“ Carvalho”). 
 Regarding claim 8, Hueter in view of Ma and in view Zhou teaches the method of claim 4, but does not teach wherein the determining the optimal set of user segments further comprises: using a genetic algorithm to select an initial population of relevant conditions and create a set of user segments; iteratively performing operations comprising: determining a fitness score of each user segment in the set of user segments, based on the fitness score, combining two of the user segments from the set of user segments, and updating the set of user segments with a new relevant condition.
However, Carvalho teaches:  wherein the determining the optimal set of user segments further comprises: using a genetic algorithm to select an initial population of relevant conditions and create a set of user segments(Carvalho, pgs. 15-17, “First, GAs work with a population of candidate solutions (individuals)…Intuitively, the ability of GAs to cope with attribute interaction makes them a potentially useful solution for the problem of small disjuncts….” Carvalho teaches First, GAs work with a population of candidate solutions (individuals) (i.e. using a genetic algorithm to select an initial population of relevant conditions) useful solution for the problem of small disjuncts (i.e. and create a set of user segments)); iteratively performing operations comprising: determining a fitness score of each user segment in the set of user segments, based on the fitness score(Carvalho, pgs. 17-20, “Let us now turn to the fitness function––i.e., to the function used to evaluate the quality of the candidate small-disjunct rule represented by an individual. In both GAs described in this paper, the fitness function is given by the formula: Fitness (TP/(TP + FN)) * (TN/(FP+TN)) where TP, FN, TN and FP standing for the number of true positives, false negatives, true negatives and false positives––are well-known variables often used to evaluate the performance of classification rules.”), combining two of the user segments from the set of user segments, and updating the set of user segments with a new relevant condition(Carvalho, pgs. 25-27, “The pseudo-code of our GA with sequential niching is shown, at a high level of abstraction, in Fig. 6… First, it runs the GA, using TrainingSet-2 as the training data for the GA. The best rule found by the GA is added to RuleSet. Then the examples correctly covered by that rule are removed from TrainingSet-2, so that in the next iteration of the WHILE loop TrainingSet-2 will have a smaller cardinality.” Carvalho teaches The pseudo-code of our GA with sequential niching is shown in Fig. 6.The best rule found by the GA is added to RuleSet (i.e. combining two of the user segments from the set of user segments and updating the set of user segments with a new relevant condition)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Hueter in view of Ma and in view Zhou and further in view of Carvalho the motivation to do so would be to reduce the errors associated with small disjoints for better accuracy(Carvalho, pg. 15, “At first glance, perhaps one could ignore small disjuncts, since they tend to be error prone and seem to have a small impact on predictive accuracy. However, small disjuncts are actually quite important in data mining and should not be ignored. The main reason is that, even though each small disjunct covers a small number of examples, the set of all small disjuncts can cover a large number of examples. For instance [10] reports a real-world application where small disjuncts cover roughly 50% of the training examples. In such cases we need to discover accurate small-disjunct rules in order to achieve a good classification accuracy rate.”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Lakkaraju, Himabindu, Stephen H. Bach, and Jure Leskovec. "Interpretable decision sets: A joint framework for description and prediction." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016(proposes interpretable decision sets, a framework for building predictive models that are highly accurate, yet also highly interpretable. Decision sets are sets of independent if-then rules. Because each rule can be applied independently and easily interpretable.)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-7PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Adam Clark Standke
Assistant Examiner
Art Unit 2129



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim requiring one or more elements but not all.