DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 11/6/2020 have been fully considered but they are not persuasive. Applicant’s first argument is as follows:
“Ray teaches a method to fill the missing values with estimated ones. Ray further discloses calculation of mean or median for all non-missing values of that variable to replace missing values with mean or median. However, Ray does not disclose that the mean or median value is calculated for replacing missing values in case a number of missing values of the variable is within a threshold level. As recited by claim 1, the applicant claims that the variables with a number of missing values within a threshold level will be replaced using a median value for one or more missing values to prepare a data set. Therefore, claim 1 is not taught, suggested or rendered obvious over the combination of Zhang, Vaughan and Ray.”

Although Applicant is correct that Ray does not teach a threshold level, the limitation of “preparing a dataset by using a median value for one or more missing values of variables in case a number of missing values of the variables is within a threshold level” is not explicitly taught by Applicant’s disclosure, as further discussed in the §112 rejection below.  
The only mention of a threshold in the Specification refers to intelligent threshold selection via threshold selection step 108, which appears to be distinct from “preparing a dataset by using a median value for one or more missing values” since missing value imputation is part of the data preparation step 100 ([0017]).  The threshold selection appears to only relate to the assigning lead score values to either 1 .
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-18 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the enablement requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention. The limitation of claims 1 and 10 of “preparing a dataset by using a median value for one or more missing values of variables in case a number of missing values of the variables is within a threshold level” is not enabled by Applicant’s disclosure since the details of how the threshold level is determined are not disclosed, as discussed in the Response to Arguments above.  
Looking at Figure 2, Examiner believes the claimed threshold refers to “variable with acceptable % of missing values are inputted with zeros or median” under Missing Value Imputation.  However, no guidance is provided as to what is an acceptable .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
Determining the scope and contents of the prior art.
Ascertaining the differences between the prior art and the claims at issue.
Resolving the level of ordinary skill in the pertinent art.
Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al (US 2017/0132516) in view of Vaughan et al (US 2017/0069216),Ray (NPL: “7 Steps of Data Exploration & Preparation – Part 2”) and Kala et al (US 2016/0019267)
For claim 1, Zhang teaches a computer-implemented method (see [0045]) for lead scoring (estimating likelihood that a product will be purchased, see Abstract and [0017]), comprising: 
performing random sampling of the data set (via 160 of Figure 1, see [0031] and [0043]) to generate training and test data (see [0040]); 
building a model based on the training and test data (320 of Figure 3, see [0051], as explained below); 
It is noted that Zhang teaches in claim 1 that “the combined sample list [comprises] a random sample of the imbalanced large scale data set” and further teaches “feeding the combined sample list into a prediction model enabling the prediction model to provide predictive capabilities with negligible variance for the imbalanced large scale data set” but is silent as to the details of the prediction model.
Zhang does not distinctly disclose:
preparing a data set by assigning a median value to variables having missing values wherein the variables with missing values are within a threshold level
refining the model by using a true positive rate and a true negative rate to optimize creation of a set of prioritized leads.  
However, Vaughan teaches a prediction model (110, 115, 120, 125 of Figure 2) which is refined (see [0097]) by using a true positive rate (sensitivity) and a true negative rate (specificity).  
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to implement Zhang’s prediction model using Vaughan’s 
The combination of Zhang and Vaughan as defined above teaches refining the model by using a true positive rate and a true negative rate (see [0097] of Vaughan) to optimize creation of a set of prioritized leads (identifying valuable customers or estimating the likelihood that a product will be purchased, see Zhang’s Abstract).
Further, the combination of Zhang and Vaughan does not distinctly disclose preparing a data set by assigning a median value as claimed.
However, Ray teaches that median imputation is method to fill in missing values with estimated ones by “replacing the missing data for a given attribute by the mean or median (quantitative attribute) …of all known values of that variable” (¶2 of “Methods to treat Missing Values”).  Therefore, Ray teaches preparing a data set by using a median value for one or more missing values of variables.
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to assign a median value (i.e., median imputation) to variables having missing values from the data set of the combination of Zhang and Vaughan before performing random sampling in order to estimate missing values by known methods, as evidenced by Ray.
The combination of Zhang, Vaughan and Ray does not distinctly disclose a threshold level as claimed.
However, Kala teaches in [0036] that “the data preprocessing engine 201 may automatically use imputation methods to fill the missing values on the attributes with less than the threshold missing values. The data preprocessing engine 201 may prompt 
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to use median imputation on attributes with less than a threshold number of missing values in the combination of Zhang, Vaughan and Ray as defined above since the particular known technique was recognized as part of the ordinary capabilities of one skilled in the art, as evidenced by Kala.
For claim 10, Zhang teaches a system to score lead (estimating likelihood that a product will be purchased, see Abstract and [0017]) comprising:
perform random sampling of the data set (via 160 of Figure 1, see [0031] and [0043]) to generate training and test data (see [0040]); 
build a model based on the training and test data (320 of Figure 3, see [0051], as explained below); 
It is noted that Zhang teaches in claim 1 that “the combined sample list [comprises] a random sample of the imbalanced large scale data set” and further teaches “feeding the combined sample list into a prediction model enabling the prediction model to provide predictive capabilities with negligible variance for the imbalanced large scale data set” but is silent as to the details of the prediction model.
Zhang does not distinctly disclose:
preparation of a data set by assigning a median value to variables having missing values, wherein the variables with missing values are within a threshold level;
refine the model by using a true positive rate and a true negative rate to optimize creation of a set of prioritized leads.
However, Vaughan teaches a prediction model (110, 115, 120, 125 of Figure 2) which is refined (see [0097]) by using a true positive rate (sensitivity) and a true negative rate (specificity).  
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to implement Zhang’s prediction model using Vaughan’s prediction model to compensate for sample bias and class imbalance that may exist in Zhang’s training data (see [0097] of Vaughan).
The combination of Zhang and Vaughan as defined above teaches refining the model by using a true positive rate and a true negative rate (see [0097] of Vaughan) to optimize creation of a set of prioritized leads (identifying valuable customers or estimating the likelihood that a product will be purchased, see Zhang’s Abstract).
Further, the combination of Zhang and Vaughan does not distinctly disclose preparing a data set by assigning a median value as claimed.
However, Ray teaches that median imputation is method to fill in missing values with estimated ones by “replacing the missing data for a given attribute by the mean or median (quantitative attribute) …of all known values of that variable” (¶2 of “Methods to treat Missing Values”).  Therefore, Ray teaches preparation of a data set by using a median value for one or more missing values of variables.
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to assign a median value (i.e., median imputation) to variables having missing values from the data set of the combination of Zhang and Vaughan before performing random sampling in order to estimate missing values by known methods, as evidenced by Ray.
The combination of Zhang, Vaughan and Ray does not distinctly disclose a threshold level as claimed.
However, Kala teaches in [0036] that “the data preprocessing engine 201 may automatically use imputation methods to fill the missing values on the attributes with less than the threshold missing values. The data preprocessing engine 201 may prompt the user to upload cleaner data for the attributes where the missing values are more than the threshold value.”
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to use median imputation on attributes with less than a threshold number of missing values in the combination of Zhang, Vaughan and Ray as defined above since the particular known technique was recognized as part of the ordinary capabilities of one skilled in the art, as evidenced by Kala.
Claims 2-5 and 11-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, Vaughan, Ray and Yahalom et al (US 2012/0202226).
For claims 2 and 11, Zhang as modified by Vaughan and Ray teaches all of the limitations of claims 1 and 10 as cited above, but does not distinctly disclose:
                        
                            T
                            P
                            R
                            =
                            
                                
                                    N
                                    u
                                    m
                                    b
                                    e
                                    r
                                     
                                    o
                                    f
                                     
                                    E
                                    n
                                    q
                                    u
                                    i
                                    r
                                    i
                                    e
                                    s
                                     
                                    P
                                    r
                                    e
                                    d
                                    i
                                    c
                                    t
                                    e
                                    d
                                     
                                    a
                                    s
                                     
                                    P
                                    o
                                    t
                                    e
                                    n
                                    t
                                    i
                                    a
                                    l
                                     
                                    O
                                    r
                                    d
                                    e
                                    r
                                    s
                                     
                                    (
                                    i
                                    .
                                    e
                                    .
                                    ,
                                     
                                    T
                                    r
                                    u
                                    e
                                     
                                    P
                                    o
                                    s
                                    i
                                    t
                                    i
                                    v
                                    e
                                    s
                                    )
                                
                                
                                    N
                                    u
                                    m
                                    b
                                    e
                                    r
                                     
                                    o
                                    f
                                     
                                    E
                                    n
                                    q
                                    u
                                    i
                                    r
                                    i
                                    e
                                    s
                                     
                                    A
                                    c
                                    t
                                    u
                                    a
                                    l
                                    l
                                    y
                                     
                                    C
                                    o
                                    n
                                    v
                                    e
                                    r
                                    t
                                    e
                                    d
                                     
                                    t
                                    o
                                     
                                    O
                                    r
                                    d
                                    e
                                    r
                                    s
                                     
                                    (
                                    i
                                    .
                                    e
                                    .
                                    ,
                                     
                                    T
                                    r
                                    u
                                    e
                                     
                                    P
                                    o
                                    s
                                    i
                                    t
                                    i
                                    v
                                    e
                                    s
                                    +
                                    F
                                    a
                                    l
                                    s
                                    e
                                     
                                    N
                                    e
                                    g
                                    a
                                    t
                                    a
                                    i
                                    v
                                    e
                                    s
                                    )
                                
                            
                        
                    
However, Yahalom teaches in [0129]-[0130] that sensitivity (i.e., true positive rate) is defined as: 
                        
                            T
                            P
                            R
                            =
                            
                                
                                    T
                                    r
                                    u
                                    e
                                     
                                    P
                                    o
                                    s
                                    i
                                    t
                                    i
                                    v
                                    e
                                
                                
                                    T
                                    r
                                    u
                                    e
                                     
                                    P
                                    o
                                    s
                                    i
                                    t
                                    i
                                    v
                                    e
                                    +
                                    F
                                    a
                                    l
                                    s
                                    e
                                     
                                    N
                                    e
                                    g
                                    a
                                    t
                                    i
                                    v
                                    e
                                
                            
                        
                    
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to calculate TPR in the combination of Zhang, Vaughan and 
For claims 3 and 12, Zhang as modified by Vaughan and Ray teaches all of the limitations of claims 1 and 10 as cited above, but does not distinctly disclose:
                        
                            T
                            N
                            R
                            =
                            
                                
                                    N
                                    u
                                    m
                                    b
                                    e
                                    r
                                     
                                    o
                                    f
                                     
                                    E
                                    n
                                    q
                                    u
                                    i
                                    r
                                    i
                                    e
                                    s
                                     
                                    P
                                    r
                                    e
                                    d
                                    i
                                    c
                                    t
                                    e
                                    d
                                     
                                    a
                                    s
                                     
                                    P
                                    o
                                    t
                                    e
                                    n
                                    t
                                    i
                                    a
                                    l
                                     
                                    D
                                    r
                                    o
                                    p
                                    s
                                     
                                    (
                                    i
                                    .
                                    e
                                    .
                                    ,
                                     
                                    T
                                    r
                                    u
                                    e
                                     
                                    N
                                    e
                                    g
                                    a
                                    t
                                    i
                                    v
                                    e
                                    s
                                    )
                                
                                
                                    N
                                    u
                                    m
                                    b
                                    e
                                    r
                                     
                                    o
                                    f
                                     
                                    E
                                    n
                                    q
                                    u
                                    i
                                    r
                                    i
                                    e
                                    s
                                     
                                    A
                                    c
                                    t
                                    u
                                    a
                                    l
                                    l
                                    y
                                     
                                    C
                                    o
                                    n
                                    v
                                    e
                                    r
                                    t
                                    e
                                    d
                                     
                                    t
                                    o
                                     
                                    D
                                    r
                                    o
                                    p
                                    s
                                     
                                    (
                                    i
                                    .
                                    e
                                    .
                                    ,
                                     
                                    T
                                    r
                                    u
                                    e
                                     
                                    N
                                    e
                                    g
                                    a
                                    t
                                    i
                                    v
                                    e
                                    s
                                    +
                                    F
                                    a
                                    l
                                    s
                                    e
                                     
                                    P
                                    o
                                    s
                                    i
                                    t
                                    i
                                    v
                                    e
                                    s
                                    )
                                
                            
                        
                    
However, Yahalom teaches in [0129]-[0130] that specificity (i.e., true negative rate) is defined as: 
                        
                            T
                            N
                            R
                            =
                            
                                
                                    T
                                    r
                                    u
                                    e
                                     
                                    N
                                    e
                                    g
                                    a
                                    t
                                    i
                                    v
                                    e
                                
                                
                                    T
                                    r
                                    u
                                    e
                                     
                                    N
                                    e
                                    g
                                    a
                                    t
                                    i
                                    v
                                    e
                                    +
                                    F
                                    a
                                    l
                                    s
                                    e
                                     
                                    P
                                    o
                                    s
                                    i
                                    t
                                    i
                                    v
                                    e
                                
                            
                        
                    
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to calculate TNR in the combination of Zhang, Vaughan and Ray as taught by Yahalom since it was recognized as part of the ordinary capabilities of one skilled in the art, as evidenced by Yahalom.
Claims 4, 5, 13 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, Vaughan, Ray and Beniwal et al (NPL document entitled “Classification and Feature Selection Techniques in Data Mining”).
For claims 4 and 13, Zhang as modified by Vaughan and Ray teaches all of the limitations of claim 1 and 10 as cited above, but does not distinctly disclose the data set is further filtered to remove variables based on their usefulness.
However, Beniwal teaches data preprocessing and feature selection (see “3. Feature Selection”) which comprises removing irrelevant attributes via filter approach, wherein “a feature relevance score is calculated, and low-scoring features are removed. The subset of features left after feature removal is presented as input to the classification algorithm” (2nd
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to use Beniwal’s filter approach to remove variables in the data set of the combination Zhang and Vaughan in order to avoid overfitting, improve model performance and to provide faster and more cost-effective models (see 1st ¶ of “3. Feature Selection”).
For claims 5 and 14, Zhang as modified by Vaughan and Ray teaches all of the limitations of claims 1 and 10 as cited above, but does not distinctly disclose identifying independent variables and dependent variables.
However, Beniwal teaches data preprocessing and feature selection (see “3. Feature Selection”) using a wrapper approach which requires identifying independent variables and dependent variables (“advantages of wrapper approaches include…the ability to take into account feature dependencies”, see “3. Feature Selection”).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to use Beniwal’s wrapper approach to remove variables in the data set of the combination of Zhang, Vaughan and Ray in order to avoid overfitting, improve model performance and to provide faster and more cost-effective models (see 1st ¶ of “3. Feature Selection”).
Claims 6 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, Vaughan, Ray, Beniwal and Chen (NPL: “Advances in Knowledge Discovery and Data Mining”)
For claims 6 and 15, Zhang as modified by Vaughan and Ray teaches all of the limitations of claim 1 as cited above and Zhang as modified by Vaughan, Ray and 
However, Beniwal teaches:
creating new variables (by correcting errors, see Data Cleaning of “2. Data Preprocessing”) and dummy variables (see Data Integration of “2. Data Preprocessing”); and 
grouping the variables based on their conversion levels (see Discretization of “2. Data Preprocessing”).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to use Beniwal’s data preprocessing on the data set of the combination of Zhang, Vaughan and Ray in order to clean the data (see Data Cleaning of “2. Data Preprocessing”) and improve comprehensibility (see Discretization of “2. Data Preprocessing”).
The combination of Zhang, Vaughan, Ray and Beniwal does not distinctly disclose:
ignoring variables with missing values up to a threshold percentage level.
However, Chen teaches selecting variables which includes: ignoring variables with missing values up to a threshold percentage level (see 1st ¶ of “I. Wisconsin Breast Cancer Diagnostic Data Set”, page 98); 
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to ignore variables with missing values up to a threshold percentage level within the data set of the combination Zhang and Vaughan in order to .
Claims 7-9 and 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, Vaughan, Ray and Pinto et al (US 2005/0234753).
For claims 7 and 16, Zhang as modified by Vaughan and Ray teaches all of the limitations of claim 1 and 10 as cited above, but does not distinctly disclose the limitations of claim 7 and 16.
However, Pinto teaches validating the built model comprising: 
determining a model from a plurality of models using a lift chart (see [0003]); and 
performing a concordance test to validate the model (see [0198]).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to incorporate a graphical user interface which displays a lift chart and concordance information to allow an “analyst [to use] his experience and skill to create a custom model using available model building software applied to currently available data”, (see [0002]).
For claims 8 and 17, Zhang as modified by Vaughan, Ray and Pinto teaches all of the limitations of claim 7 and 16 as cited above and Pinto further teaches:
succeeding the validation of the built model:
prediction of lead scores for the data set (via “Select Validation Dataset” of Fig. 25A, see [0205]); and 
tracking the predicted lead scores against an actual data
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to predict lead scores and track lead scores in the combination of Zhang, Vaughan, Ray as taught by Pinto in order to allow a user to generate, update, change, review and deploy models at a low cost with better results (see [0035]).
For claims 9 and 18,  Zhang as modified by Vaughan, Ray and Pinto teaches all of the limitations of claim 8 and 17 as cited above and Pinto further teaches:
building a new model (via Validate model, Figure 25A) or updating the built model (via Reconsider Model, Figure 25A) based on a determination if the predicted lead scores are below a threshold (as determined by Figures 25B-C).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL CALRISSIAN PUENTES whose telephone number is (571)270-5070.  The examiner can normally be reached on M-F 9-6:30 (flex).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.