Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding the independent claims, claims 1, 9, and 17 are rejected as indefinite for two reasons. First, claims 1, 9, and 17 are rejected because the claims recite (emphasized): "…obtaining a set of structured data including individual records having a set of data features and corresponding data values, wherein processing the set of structured data by a machine learning model results in a proposed action corresponding to an individual record;…"  This limitation is not clear because processing the set of structured data is passively recited and it is not clear if the scope of this limitation requires processing the set of structured data or if this limitation is only reciting the intended use of the set of structured data1.  For the purposes of analyzing the claim set, Examiner is interpreting the limitation as requiring processing the set of structured data, i.e. as (emphasized): "…processing the set of structured data by a machine learning model, which results in a proposed action corresponding to an individual record;…
Second, similarly, claims 1, 9, and 17 are rejected because the claims recite (emphasized): "…generating a set of perturbed records for arbitrary indicator identification based on value variances for a target data feature of the reduced dataset, the arbitrary indicator identification being a test of the machine learning model…"  This limitation is not clear because the arbitrary indicator identification being a test is passively recited and it is not clear if the scope of this limitation requires testing the machine learning model or if this limitation is only reciting the intended use of the arbitrary indicator identification2.  For the purposes of analyzing the claim set, Examiner is interpreting the limitation as requiring testing the machine learning model, i.e. as (emphasized): "…generating a set of perturbed records for arbitrary indicator identification based on value variances for a target data feature of the reduced dataset, testing the machine learning model via the arbitrary indicator identification…"
Accordingly, claims 1, 9, and 17 are rejected as indefinite under 112(b). Claims 2-8, 10-16, and 18-20 do not clarify this issue and accordingly are rejected due to their dependencies.
Claims 7 and 15 are further rejected as indefinite for two reasons.  First, claims 7 and 15 are rejected because the claims recite (emphasized) "…identifying the value variances in a data feature table associating the at least one data feature with the value variances."  There is insufficient antecedent basis for this limitation.  Further, this limitation is not clear because the claims introduce a "set of data features" claim element and a "data feature table" claim element.  It is not clear if or how the "at least one data feature" relates to the "set of data features" or "data feature table" claim elements.
Second, claims  7 and 15 are rejected because the claims recite (emphasized) "…identifying the value variances in a data feature table associating the at least one data feature with the value variances."  This limitation is not clear because it is not clear if the limitation should be construed as requiring the data feature table associates data features with the value variances or if this limitation is intended to be construed as identifying the value variances by associating the data feature with the value variances.  Alternatively, it is not clear if this limitation should be construed as two, separate steps, (i.e. an identification step and an association step).  For the purposes of analyzing the claim set, Examiner is interpreting the limitation as identifying value variances based on a data feature table.
Claims 8, 16 and 20 are further rejected as indefinite because the claims recite (emphasized) "…providing the at least one data feature to a cognitive system."  There is insufficient antecedent basis for this limitation.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Watson, et al. US Pub. No. 2020/0012891, herein referred to as "Watson" further in view of Pendar et al, US Pub. No. 2019/0087248, herein referred to as "Pendar".
Regarding claim 1, Watson teaches:
obtaining a set of structured data including individual records having a set of data features and corresponding data values (receives original data set including various customer information, ¶[0025] and Fig. 1, ref. char. 105), 
wherein processing the set of structured data by a machine learning model results in a proposed action corresponding to an individual record (model predicts customer behavior to make marketing recommendations, ¶[0003]; see also ¶[0057] discussing predictions; and e.g. ¶¶ and Figs. 8-10 discussing training models); 
generating a set of perturbed records for arbitrary indicator identification based on value variances for a target data feature of the reduced dataset (generates synthetic data using fictious information, ¶[0027] and Fig. 1 ref. char. 1103), 
the arbitrary indicator identification being a test of the machine learning model (syntenic data is used to evaluate model, e.g. ¶[0058], [0059]); 
and. determining, according to the set of perturbed records, a set of arbitrary indicators associated with the proposed action corresponding to the individual record (determines root mean square for each evaluation data sample, ¶[0030], to determine if synthetic data is sufficient to be used in production,¶¶0030], [0032]).
However Watson does not teach but Pendar does teach:
generating a correlation matrix incorporating the set of data features and the corresponding data values (forms correlation matrix, ¶¶[0026], [0040]); 
identifying, with reference to the correlation matric, a set of data feature pairs as duplicative of one another according to a correlation criterion (identifies weighted variables that are highly correlated to one another, ¶¶[0026], [0040]); 
removing, from the set of data features, one data feature of each identified data feature pair to generate a reduced dataset (removes correlated values to reduce number of variables, ¶¶[0026], [0040]).
Further, it would have been obvious at the time of filing to combine the customer behavior modelling of Watson with the reduction of variables based on a correlation matrix, as taught by Pendar, because Pendar explicitly suggests reducing variables based on a correlation matrix to reduce the amount of data that needs to be analyzed, ¶¶0026], [0040]; see also MPEP 2143.I.G.
Regarding claim 2, the combination of Watson and Pendar teaches all the limitations of claim 1 and Watson further teaches:
taking the proposed action (marketer follows the recommendations the model generated, ¶[0003]); 
wherein: the set of arbitrary indicators includes a count of indicators below a threshold count (sufficiency of synthetic data is based a summed error being less than a threshold, ¶[0030]).
Regarding claim 3, the combination of Watson and Pendar teaches all the limitations of claim 1 and Watson further teaches:
notifying a user of the set of arbitrary indicators (users evaluate process to determine if synthetic data is sufficient, ¶[0030], and plots function in order to help users visualize the data comparison, ¶[0056] and Figs. 6, 7); 
wherein: the set of arbitrary indicators includes a count of indicators above a threshold count (if summed error is greater than a threshold criterion, synthetic data is not sufficient, ¶[0030]).  
Regarding claim 4, the combination of Watson and Pendar teaches all the limitations of claim 1 and Watson further teaches:
wherein the set of arbitrary indicators is made up of binary features (synthetic data is determined to be sufficient or not, e.g. ¶¶[0030], [0032].  This would be a binary feature because it is either sufficient or not).
Regarding claim 5, the combination of Watson and Pendar teaches all the limitations of claim 1 and Watson further teaches:
wherein each arbitrary indicator is associated with a unique output of the machine learning model when processing the individual records (models are applied to evaluation data set, ¶[0029], and results of the application are analyzed to determine sufficiency, ¶¶[0029]-[0030] and Fig. 10, ref. chars. 1025, 1030, 1035 and ¶[0060]).
Regarding claim 6, the combination of Watson and Pendar teaches all the limitations of claim 1 and Watson further teaches:
wherein each data feature of the set of data features includes a data field classification within the set of structured data (customer information includes various fields like name, address, phone number, email address, bank account information, investment account information, spending information, ¶[0025].  These would be data field classifications because they are classifying the type of stored data).
Regarding claim 7, the combination of Watson and Pendar teaches all the limitations of claim 1 and Watson further teaches:
identifying the value variances in a data feature table associating the at least one data feature with the value variances (generates fictitious biographical information based on real information, ¶[0027]).  
Regarding claim 8, the combination of Watson and Pendar teaches all the limitations of claim 1 and Watson further teaches:
providing the at least one data feature to a cognitive system, the cognitive system including a domain-specific knowledge corpus; and receiving from the cognitive system the value variances derived from the domain-specific knowledge corpus (randomly generates fictional information similar to the information contained within original data and generates fictitious biographical information based on real information, ¶[0027]).  

Regarding claim 9, Watson teaches:
a set of storage device(s); and computer code stored collectively in the set of storage device(s), with the computer code including data and instructions to cause a processor(s) set to perform at least the following operations (¶¶[0061]-[0062] and Fig. 11):
obtaining a set of structured data including individual records having a set of data features and corresponding data values (receives original data set including various customer information, ¶[0025] and Fig. 1, ref. char. 105), 
wherein processing the set of structured data by a machine learning model results in a proposed action corresponding to an individual record (model predicts customer behavior to make marketing recommendations, ¶[0003]; see also ¶[0057] discussing predictions; and e.g. ¶¶ and Figs. 8-10 discussing training models); 
generating a set of perturbed records for arbitrary indicator identification based on value variances for a target data feature of the reduced dataset (generates synthetic data using fictious information, ¶[0027] and Fig. 1 ref. char. 1104), 
the arbitrary indicator identification being a test of the machine learning model (syntenic data is used to evaluate model, e.g. ¶[0058], [0059]); 
and. determining, according to the set of perturbed records, a set of arbitrary indicators associated with the proposed action corresponding to the individual record (determines root mean square for each evaluation data sample, ¶[0030], to determine if synthetic data is sufficient to be used in production,¶¶0030], [0032]).
However Watson does not teach but Pendar does teach:
generating a correlation matrix incorporating the set of data features and the corresponding data values (forms correlation matrix, ¶¶[0026], [0040]); 
identifying, with reference to the correlation matric, a set of data feature pairs as duplicative of one another according to a correlation criterion (identifies weighted variables that are highly correlated to one another, ¶¶[0026], [0040]); 
removing, from the set of data features, one data feature of each identified data feature pair to generate a reduced dataset (removes correlated values to reduce number of variables, ¶¶[0026], [0040]).
Further, it would have been obvious at the time of filing to combine the customer behavior modelling of Watson with the reduction of variables based on a correlation matrix, as taught by Pendar, because Pendar explicitly suggests reducing variables based on a correlation matrix to reduce the amount of data that needs to be analyzed, ¶¶0026], [0040]; see also MPEP 2143.I.G.
Regarding claim 10, the combination of Watson and Pendar teaches all the limitations of claim 9 and Watson further teaches:
taking the proposed action (marketer follows the recommendations the model generated, ¶[0003]); 
wherein: the set of arbitrary indicators includes a count of indicators below a threshold count (sufficiency of synthetic data is based a summed error being less than a threshold, ¶[0030]).
Regarding claim 11, the combination of Watson and Pendar teaches all the limitations of claim 9 and Watson further teaches:
notifying a user of the set of arbitrary indicators (users evaluate process to determine if synthetic data is sufficient, ¶[0030], and plots function in order to help users visualize the data comparison, ¶[0056] and Figs. 6, 7); 
wherein: the set of arbitrary indicators includes a count of indicators above a threshold count (if summed error is greater than a threshold criterion, synthetic data is not sufficient, ¶[0030]).  
Regarding claim 12, the combination of Watson and Pendar teaches all the limitations of claim 9 and Watson further teaches:
wherein the set of arbitrary indicators is made up of binary features (synthetic data is determined to be sufficient or not, e.g. ¶¶[0030], [0032].  This would be a binary feature because it is either sufficient or not).
Regarding claim 13, the combination of Watson and Pendar teaches all the limitations of claim 9 and Watson further teaches:
wherein each data feature of the set of data features includes a data field classification within the set of structured data (customer information includes various fields like name, address, phone number, email address, bank account information, investment account information, spending information, ¶[0025].  These would be data field classifications because they are classifying the type of stored data).
Regarding claim 14, the combination of Watson and Pendar teaches all the limitations of claim 9 and Watson further teaches:
wherein each arbitrary indicator is associated with a unique output of the machine learning model when processing the individual records (models are applied to evaluation data set, ¶[0029], and results of the application are analyzed to determine sufficiency, ¶¶[0029]-[0030] and Fig. 10, ref. chars. 1025, 1030, 1035 and ¶[0060]).
Regarding claim 15, the combination of Watson and Pendar teaches all the limitations of claim 9 and Watson further teaches:
identifying the value variances in a data feature table associating the at least one data feature with the value variances (generates fictitious biographical information based on real information, ¶[0027]).  
Regarding claim 16, the combination of Watson and Pendar teaches all the limitations of claim 9 and Watson further teaches:
providing the at least one data feature to a cognitive system, the cognitive system including a domain-specific knowledge corpus; and receiving from the cognitive system the value variances derived from the domain-specific knowledge corpus (randomly generates fictional information similar to the information contained within original data and generates fictitious biographical information based on real information, ¶[0027]).  

Regarding claim 17, Watson teaches:
a processor(s) set; a set of storage device(s); and computer code stored collectively in the set of storage device(s), with the computer code including data and instructions to cause the processor(s) set to perform at least the following operations (¶¶[0061]-[0062] and Fig. 11):
obtaining a set of structured data including individual records having a set of data features and corresponding data values (receives original data set including various customer information, ¶[0025] and Fig. 1, ref. char. 105), 
wherein processing the set of structured data by a machine learning model results in a proposed action corresponding to an individual record (model predicts customer behavior to make marketing recommendations, ¶[0003]; see also ¶[0057] discussing predictions; and e.g. ¶¶ and Figs. 8-10 discussing training models); 
generating a set of perturbed records for arbitrary indicator identification based on value variances for a target data feature of the reduced dataset (generates synthetic data using fictious information, ¶[0027] and Fig. 1 ref. char. 1105), 
the arbitrary indicator identification being a test of the machine learning model (syntenic data is used to evaluate model, e.g. ¶[0058], [0059]); 
and. determining, according to the set of perturbed records, a set of arbitrary indicators associated with the proposed action corresponding to the individual record (determines root mean square for each evaluation data sample, ¶[0030], to determine if synthetic data is sufficient to be used in production,¶¶0030], [0032]).
However Watson does not teach but Pendar does teach:
generating a correlation matrix incorporating the set of data features and the corresponding data values (forms correlation matrix, ¶¶[0026], [0040]); 
identifying, with reference to the correlation matric, a set of data feature pairs as duplicative of one another according to a correlation criterion (identifies weighted variables that are highly correlated to one another, ¶¶[0026], [0040]); 
removing, from the set of data features, one data feature of each identified data feature pair to generate a reduced dataset (removes correlated values to reduce number of variables, ¶¶[0026], [0040]).
Further, it would have been obvious at the time of filing to combine the customer behavior modelling of Watson with the reduction of variables based on a correlation matrix, as taught by Pendar, because Pendar explicitly suggests reducing variables based on a correlation matrix to reduce the amount of data that needs to be analyzed, ¶¶0026], [0040]; see also MPEP 2143.I.G.
Regarding claim 18, the combination of Watson and Pendar teaches all the limitations of claim 17 and Watson further teaches:
taking the proposed action (marketer follows the recommendations the model generated, ¶[0003]); 
wherein: the set of arbitrary indicators includes a count of indicators below a threshold count (sufficiency of synthetic data is based a summed error being less than a threshold, ¶[0030]).
Regarding claim 19, the combination of Watson and Pendar teaches all the limitations of claim 17 and Watson further teaches:
notifying a user of the set of arbitrary indicators (users evaluate process to determine if synthetic data is sufficient, ¶[0030], and plots function in order to help users visualize the data comparison, ¶[0056] and Figs. 6, 7); 
wherein: the set of arbitrary indicators includes a count of indicators above a threshold count (if summed error is greater than a threshold criterion, synthetic data is not sufficient, ¶[0030]).  
Regarding claim 20, the combination of Watson and Pendar teaches all the limitations of claim 17 and Watson further teaches:
providing the at least one data feature to a cognitive system, the cognitive system including a domain-specific knowledge corpus; and receiving from the cognitive system the value variances derived from the domain-specific knowledge corpus (randomly generates fictional information similar to the information contained within original data and generates fictitious biographical information based on real information, ¶[0027]).  




Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Kailas et al, US Pub. No. 2018/0129961 teaches a similar method of feature selection
Jiang et al, US Pub. No. 2006/0161403 teaches a similar method of feature selection
Rinivasan, Aishwaryav, "Why exclude highly correlated features when building regression model??", Aug. 23, 20196 teaches a similar method of feature selection

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRENDAN S O'SHEA whose telephone number is (571)270-1064. The examiner can normally be reached Monday to Friday 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lynda Jasmin can be reached on (571) 272-6782. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRENDAN S O'SHEA/
Examiner, Art Unit 3629                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Please note, if this limitation is only reciting the intended use of the set of structured data then it the does not further limit the scope of the claim because it is only an intended use, see MPEP 2103.I.C.
        2 Please note, if this limitation is only reciting the intended use of the arbitrary indicator identification then it the does not further limit the scope of the claim because it is only an intended use, see MPEP 2103.I.C.
        3 Please note, Examiner finds Applicant is acting as their own lexicographer and has defined the term "perturbed records" in ¶[0038] of the Specification as filed; see MPEP 2111.01.IV.  
        4 Please note, Examiner finds Applicant is acting as their own lexicographer and has defined the term "perturbed records" in ¶[0038] of the Specification as filed; see MPEP 2111.01.IV.  
        5 Please note, Examiner finds Applicant is acting as their own lexicographer and has defined the term "perturbed records" in ¶[0038] of the Specification as filed; see MPEP 2111.01.IV.  
        6 Please note, this reference was cited in IDS dated Apr. 27, 2020