DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement filed 07/07/2019 fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed.  It has been placed in the application file, but the information referred to therein has not been considered.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 8, and 15 recite the limitation "classifying the plurality of data strings to respective different categories based on a loose string format”. It is unclear whether this is the same loose string format previously recited or another loose string format.  There is insufficient antecedent basis for this limitation in the claim.
Claims 2, 9, and 16 recite the limitation “a plurality of categories”. It is unclear how this plurality of categories relates to the plurality of categories in the previous claim, or if this is a new plurality.  There is insufficient antecedent basis for this limitation in the claim.
Claims 3, 10, and 17 recite the limitation “wherein matching proportion”.  There is insufficient antecedent basis for this limitation in the claim.
Claims 5, 12, and 19 recite “distribution of plurality of data strings”.  There is insufficient antecedent basis for this limitation in the claim.
Claims 4, 6-7, 11, 13-14, 18 and 20 are rejected for the same reasons by virtue of their dependency on Claims 1, 8, and 15, respectively.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5-9, 12-16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Freudiger (US 2017/0124336), in view of Griffith (US 2018/0314705).
Regarding Claim 1, Freudiger teaches a method (Fig. 3) implemented in a computer system comprising a processor, memory accessible by the processor, computer program instructions stored in the memory and executable by the processor, and data stored in the memory and accessible by the processor ([0018], Fig. 1), the method comprising: 
obtaining, at the computer system, data including a plurality of data strings of a plurality of different categories ([0028-0029], Fig. 3, anonymizer receives the encrypted dataset from a data owner, which can include, in one embodiment, unencrypted attributes and encrypted data values, and accesses (block 41) a dictionary of strings that refer to quasi-identifiers, such as known attributes, and sensitivity values, anonymizer can compile the dictionary of strings based on previous experience analyzing attributes, each sensitivity value represents how likely a string includes quasi-identifier or attribute that can be used to identify an individual associated with a data value for that attribute) , the data strings in each category have a same string pattern ([0031], string matching can occur via the Naïve string search algorithm or other types of pattern algorithms, to prevent incorrect matches, such as an attribute for “phone number” being matched with a dictionary entry for “social security number” due to the similarity of the term “number,” an algorithm can be trained to separate the different phrases, rules for each of the attributes can be applied, a rule for social security number can include three numbers followed by a dash, followed by two numbers, followed by a dash, followed by four numbers, in contrast, a telephone number rule can include three numbers followed by a dash, followed by three numbers, followed by a dash, followed by four numbers); 
determining a loose string format and a set of restrictions based on at least one string pattern ([0031], Fig. 3, each of the attributes in the dataset (~attribute indicates format of dataset) are then compared (block 42) with the dictionary); 
classifying the plurality of data strings to respective different categories based on a loose string format of the data strings and on the restrictions on the data strings of the different categories by determining a classification score indicating a match of a data string that matches the loose string pattern and meets the restrictions ([0031-0032], Fig. 3, attributes in the dataset are then compared (block 42) with the dictionary and one or more matches can be identified (block 43) for each of the attributes, string matching includes comparing an attribute with each of the dictionary string entries and identifying one or more entries, which includes the attribute or which is included within the attribute, attribute comparison via a similarity metric includes calculating a measure of similarity between each attribute and each dictionary string, a similarity threshold is defined and applied to each similarity measure to determine which dictionary string is most similar to an attribute, similarity measures are then used to identify which dictionary string matches or most closely matches an attribute, a threshold is applied to each similarity measure for an attribute and the dictionary string associated with a similarity measure that exceeds the threshold is identified as a match to the attribute), 
wherein the classifying utilizes restriction information of other categories when determining the matching of a category ([0031-0033], Fig. 3, both string matching and similarity metric matching compares each attribute with each dictionary string, classifying based on the attribute of every string/category in the dictionary, once a matching string has been identified, the weight associated with that string is assigned to the attribute and a predetermined threshold is applied (block 45) to the sensitivity value, if the sensitivity value exceeds the threshold, the attribute is recommended (block 46) for anonymization based on the identified sensitivity, if a match is not identified, another attribute is selected (not shown) until all the attributes in the dataset have been processed).
Freudiger fails to teach decreasing the classification score if a mean restriction matching proportion is not part of a category or is a threshold amount above an expected mean restriction matching proportion.  
In the same field of endeavor, Griffith teaches decreasing the classification score if a mean restriction matching proportion is not part of a category or is a threshold amount above an expected mean restriction matching proportion ([0176], non-compliant data attribute may be referred to as a data attribute that may be non-compliant with one or more values set forth in the analyzation data, a detected numeric value that is more than 4 standard deviations from a mean value for a subset of data (e.g., a column of data) may be deemed “an outlier” or “out-of-range,” and, thus, deemed non-compliant with a range of valid numeric values).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification based on comparison with a dictionary of entries specifying string formats and rules to follow when matching attributes, as taught in Freudiger, to further include reducing determined matched score when data is determined to be out of range from expected values, as taught in Griffith, in order to optimize linking of datasets and remove defects in the data and form more reliable datasets. (See Griffith [0007-0008])
Regarding Claim 2, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 1 above. The combination, particularly Freudiger further teaches collecting all restrictions of a plurality of categories ([0028, 0031], accesses a dictionary of strings that refer to quasi-identifiers, such as known attributes, and sensitivity values, the anonymizer can compile the dictionary of strings based on previous experience analyzing attributes).  
Regarding Claim 5, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 1 above. The combination, particularly Freudiger further teaches wherein classifying comprises: determining a distance between a distribution of plurality of data strings and a distribution each of the category ([0038], Fig. 4, block 52 the anonymizer maps a distribution of the data values for each attribute in the dataset, block 53 each attribute distribution for the dataset is compared with each known probability density function, at block 54 a divergence is measured between a dataset attribute and each known probability density function to determine whether one of the probability density functions matches the distribution for that dataset attribute); and determining a classification score based on the determined distances ([0039], Fig. 4, block 55 if the measure of divergence is less than the threshold, a high measure of similarity exists and the distributions are considered to be a match, at block 56 the attribute of the matching probability density function is then assigned, each attribute in the dataset is compared with all of the known probability density functions in an attempt to identify the attribute of the dataset and determine which attributes should be anonymized).  
Regarding Claim 6, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 1 above. The combination, particularly Freudiger further teaches wherein classifying comprises: determining a distribution of a plurality of data strings ([0038], Fig. 4, block 52 the anonymizer maps a distribution of the data values for each attribute in the dataset); determining a distance between the determined distribution and a plurality of known distributions of categories of data ([0038], Fig. 4, block 53 each attribute distribution for the dataset is compared with each known probability density function, at block 54 a divergence is measured between a dataset attribute and each known probability density function to determine whether one of the probability density functions matches the distribution for that dataset attribute); and selecting as a category of the plurality of data strings a category from the plurality of known distributions of categories having a minimum determined distance ([0039], Fig. 4, block 55 if the measure of divergence is less than the threshold, a high measure of similarity exists and the distributions are considered to be a match, at block 56 the attribute of the matching probability density function is then assigned, each attribute in the dataset is compared with all of the known probability density functions in an attempt to identify the attribute of the dataset and determine which attributes should be anonymized).  
Regarding Claim 7, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 6 above. The combination, particularly Freudiger further teaches wherein the determined distribution comprises one of a normal distribution or a discrete distribution, and the distance is determined according to a Kullback- Leibler distance ([0038], anonymizer then maps a distribution of the data values for each attribute in the dataset, a divergence is measured between a dataset attribute and each known probability density function, the divergence can be measured using Kullback-Leibler divergence, Jensen-Shannon divergence, or variational distance to measure a distance between the distribution of a known probability density function and the distribution of a dataset attribute).  
Regarding Claim 8, Freudiger teaches a system comprising a processor, memory accessible by the processor, computer program instructions stored in the memory and executable by the processor, and data stored in the memory ([0018-0022], Fig. 1) and accessible by the processor to perform (Fig. 3): 
obtaining, at the computer system, data including a plurality of data strings of a plurality of different categories ([0028-0029], Fig. 3, anonymizer receives the encrypted dataset from a data owner, which can include, in one embodiment, unencrypted attributes and encrypted data values, and accesses (block 41) a dictionary of strings that refer to quasi-identifiers, such as known attributes, and sensitivity values, anonymizer can compile the dictionary of strings based on previous experience analyzing attributes, each sensitivity value represents how likely a string includes quasi-identifier or attribute that can be used to identify an individual associated with a data value for that attribute), 
the data strings in each category have a same string pattern ([0031], string matching can occur via the Naïve string search algorithm or other types of pattern algorithms, to prevent incorrect matches, such as an attribute for “phone number” being matched with a dictionary entry for “social security number” due to the similarity of the term “number,” an algorithm can be trained to separate the different phrases, rules for each of the attributes can be applied, a rule for social security number can include three numbers followed by a dash, followed by two numbers, followed by a dash, followed by four numbers, in contrast, a telephone number rule can include three numbers followed by a dash, followed by three numbers, followed by a dash, followed by four numbers); 
determining a loose string format and a set of restrictions based on at least one string pattern ([0031], Fig. 3, each of the attributes in the dataset (~attribute indicates format of dataset) are then compared (block 42) with the dictionary); 
classifying the plurality of data strings to respective different categories based on a loose string format of the data strings and on the restrictions on the data strings of the different categories by determining a classification score indicating a match of a data string that matches the loose string pattern and meets the restrictions ([0031-0032], Fig. 3, attributes in the dataset are then compared (block 42) with the dictionary and one or more matches can be identified (block 43) for each of the attributes, string matching includes comparing an attribute with each of the dictionary string entries and identifying one or more entries, which includes the attribute or which is included within the attribute, attribute comparison via a similarity metric includes calculating a measure of similarity between each attribute and each dictionary string, a similarity threshold is defined and applied to each similarity measure to determine which dictionary string is most similar to an attribute, similarity measures are then used to identify which dictionary string matches or most closely matches an attribute, a threshold is applied to each similarity measure for an attribute and the dictionary string associated with a similarity measure that exceeds the threshold is identified as a match to the attribute), 
wherein the classifying utilizes restriction information of other categories when determining the matching of a category ([0031-0033], Fig. 3, both string matching and similarity metric matching compares each attribute with each dictionary string, classifying based on the attribute of every string/category in the dictionary, once a matching string has been identified, the weight associated with that string is assigned to the attribute and a predetermined threshold is applied (block 45) to the sensitivity value, if the sensitivity value exceeds the threshold, the attribute is recommended (block 46) for anonymization based on the identified sensitivity, if a match is not identified, another attribute is selected (not shown) until all the attributes in the dataset have been processed).
Freudiger fails to teach decreasing the classification score if a mean restriction matching proportion is not part of a category or is a threshold amount above an expected mean restriction matching proportion.  
In the same field of endeavor, Griffith teaches decreasing the classification score if a mean restriction matching proportion is not part of a category or is a threshold amount above an expected mean restriction matching proportion ([0176], non-compliant data attribute may be referred to as a data attribute that may be non-compliant with one or more values set forth in the analyzation data, a detected numeric value that is more than 4 standard deviations from a mean value for a subset of data (e.g., a column of data) may be deemed “an outlier” or “out-of-range,” and, thus, deemed non-compliant with a range of valid numeric values). 
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification based on comparison with a dictionary of entries specifying string formats and rules to follow when matching attributes, as taught in Freudiger, to further include reducing determined matched score when data is determined to be out of range from expected values, as taught in Griffith, in order to optimize linking of datasets and remove defects in the data and form more reliable datasets. (See Griffith [0007-0008])
Regarding Claim 9, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 8 above. The combination, particularly Freudiger further teaches collecting all restrictions of a plurality of categories ([0028, 0031], accesses a dictionary of strings that refer to quasi-identifiers, such as known attributes, and sensitivity values, the anonymizer can compile the dictionary of strings based on previous experience analyzing attributes).  
Regarding Claim 12, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 8 above. The combination, particularly Freudiger further teaches wherein classifying comprises: determining a distance between a distribution of plurality of data strings and a distribution each of the category ([0038], Fig. 4, block 52 the anonymizer maps a distribution of the data values for each attribute in the dataset, block 53 each attribute distribution for the dataset is compared with each known probability density function, at block 54 a divergence is measured between a dataset attribute and each known probability density function to determine whether one of the probability density functions matches the distribution for that dataset attribute); and determining a classification score based on the determined distances ([0039], Fig. 4, block 55 if the measure of divergence is less than the threshold, a high measure of similarity exists and the distributions are considered to be a match, at block 56 the attribute of the matching probability density function is then assigned, each attribute in the dataset is compared with all of the known probability density functions in an attempt to identify the attribute of the dataset and determine which attributes should be anonymized).  
Regarding Claim 13, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 8 above. The combination, particularly Freudiger further teaches wherein classifying comprises: determining a distribution of a plurality of data strings ([0038], Fig. 4, block 52 the anonymizer maps a distribution of the data values for each attribute in the dataset); determining a distance between the determined distribution and a plurality of known distributions of categories of data ([0038], Fig. 4, block 53 each attribute distribution for the dataset is compared with each known probability density function, at block 54 a divergence is measured between a dataset attribute and each known probability density function to determine whether one of the probability density functions matches the distribution for that dataset attribute); and selecting as a category of the plurality of data strings a category from the plurality of known distributions of categories having a minimum determined distance ([0039], Fig. 4, block 55 if the measure of divergence is less than the threshold, a high measure of similarity exists and the distributions are considered to be a match, at block 56 the attribute of the matching probability density function is then assigned, each attribute in the dataset is compared with all of the known probability density functions in an attempt to identify the attribute of the dataset and determine which attributes should be anonymized).  
Regarding Claim 14, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 13 above. The combination, particularly Freudiger further teaches wherein the determined distribution comprises one of a normal distribution or a discrete distribution, and the distance is determined according to a Kullback- Leibler distance ([0038], anonymizer then maps a distribution of the data values for each attribute in the dataset, a divergence is measured between a dataset attribute and each known probability density function, the divergence can be measured using Kullback-Leibler divergence, Jensen-Shannon divergence, or variational distance to measure a distance between the distribution of a known probability density function and the distribution of a dataset attribute).  
Regarding Claim 15, Freudiger teaches a computer program product comprising a non-transitory computer readable storage having program instructions embodied therewith, the program instructions executable by a computer ([0018-0022], Fig. 1), to cause the computer to perform a method (Fig. 3) comprising: 
obtaining, at the computer system, data including a plurality of data strings of a plurality of different categories ([0028-0029], Fig. 3, anonymizer receives the encrypted dataset from a data owner, which can include, in one embodiment, unencrypted attributes and encrypted data values, and accesses (block 41) a dictionary of strings that refer to quasi-identifiers, such as known attributes, and sensitivity values, anonymizer can compile the dictionary of strings based on previous experience analyzing attributes, each sensitivity value represents how likely a string includes quasi-identifier or attribute that can be used to identify an individual associated with a data value for that attribute), 
the data strings in each category have a same string pattern ([0031], string matching can occur via the Naïve string search algorithm or other types of pattern algorithms, to prevent incorrect matches, such as an attribute for “phone number” being matched with a dictionary entry for “social security number” due to the similarity of the term “number,” an algorithm can be trained to separate the different phrases, rules for each of the attributes can be applied, a rule for social security number can include three numbers followed by a dash, followed by two numbers, followed by a dash, followed by four numbers, in contrast, a telephone number rule can include three numbers followed by a dash, followed by three numbers, followed by a dash, followed by four numbers); 
determining a loose string format and a set of restrictions based on at least one string pattern ([0031], Fig. 3, each of the attributes in the dataset (~attribute indicates format of dataset) are then compared (block 42) with the dictionary); 
classifying the plurality of data strings to respective different categories based on a loose string format of the data strings and on the restrictions on the data strings of the different categories by determining a classification score indicating a match of a data string that matches the loose string pattern and meets the restrictions ([0031-0032], Fig. 3, attributes in the dataset are then compared (block 42) with the dictionary and one or more matches can be identified (block 43) for each of the attributes, string matching includes comparing an attribute with each of the dictionary string entries and identifying one or more entries, which includes the attribute or which is included within the attribute, attribute comparison via a similarity metric includes calculating a measure of similarity between each attribute and each dictionary string, a similarity threshold is defined and applied to each similarity measure to determine which dictionary string is most similar to an attribute, similarity measures are then used to identify which dictionary string matches or most closely matches an attribute, a threshold is applied to each similarity measure for an attribute and the dictionary string associated with a similarity measure that exceeds the threshold is identified as a match to the attribute), 
wherein the classifying utilizes restriction information of other categories when determining the matching of a category ([0031-0033], Fig. 3, both string matching and similarity metric matching compares each attribute with each dictionary string, classifying based on the attribute of every string/category in the dictionary, once a matching string has been identified, the weight associated with that string is assigned to the attribute and a predetermined threshold is applied (block 45) to the sensitivity value, if the sensitivity value exceeds the threshold, the attribute is recommended (block 46) for anonymization based on the identified sensitivity, if a match is not identified, another attribute is selected (not shown) until all the attributes in the dataset have been processed).  
Freudiger fails to teach decreasing the classification score if a mean restriction matching proportion is not part of a category or is a threshold amount above an expected mean restriction matching proportion.  
In the same field of endeavor, Griffith teaches decreasing the classification score if a mean restriction matching proportion is not part of a category or is a threshold amount above an expected mean restriction matching proportion ([0176], non-compliant data attribute may be referred to as a data attribute that may be non-compliant with one or more values set forth in the analyzation data, a detected numeric value that is more than 4 standard deviations from a mean value for a subset of data (e.g., a column of data) may be deemed “an outlier” or “out-of-range,” and, thus, deemed non-compliant with a range of valid numeric values).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification based on comparison with a dictionary of entries specifying string formats and rules to follow when matching attributes, as taught in Freudiger, to further include reducing determined matched score when data is determined to be out of range from expected values, as taught in Griffith, in order to optimize linking of datasets and remove defects in the data and form more reliable datasets. (See Griffith [0007-0008])
Regarding Claim 16, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 15 above. The combination, particularly Freudiger further teaches collecting all restrictions of a plurality of categories ([0028, 0031], accesses a dictionary of strings that refer to quasi-identifiers, such as known attributes, and sensitivity values, the anonymizer can compile the dictionary of strings based on previous experience analyzing attributes).  
Regarding Claim 19, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 15 above. The combination, particularly Freudiger further teaches wherein classifying comprises: determining a distance between a distribution of plurality of data strings and a distribution each of the category ([0038], Fig. 4, block 52 the anonymizer maps a distribution of the data values for each attribute in the dataset, block 53 each attribute distribution for the dataset is compared with each known probability density function, at block 54 a divergence is measured between a dataset attribute and each known probability density function to determine whether one of the probability density functions matches the distribution for that dataset attribute); and determining a classification score based on the determined distances ([0039], Fig. 4, block 55 if the measure of divergence is less than the threshold, a high measure of similarity exists and the distributions are considered to be a match, at block 56 the attribute of the matching probability density function is then assigned, each attribute in the dataset is compared with all of the known probability density functions in an attempt to identify the attribute of the dataset and determine which attributes should be anonymized).  
Regarding Claim 20, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claim 15 above. The combination, particularly Freudiger further teaches wherein classifying comprises: determining a distribution of a plurality of data strings ([0038], Fig. 4, block 52 the anonymizer maps a distribution of the data values for each attribute in the dataset), determining a distance between the determined distribution and a plurality of known distributions of categories of data ([0038], Fig. 4, block 53 each attribute distribution for the dataset is compared with each known probability density function, at block 54 a divergence is measured between a dataset attribute and each known probability density function to determine whether one of the probability density functions matches the distribution for that dataset attribute), and selecting as a category of the plurality of data strings a category from the plurality of known distributions of categories having a minimum determined distance ([0039], Fig. 4, block 55 if the measure of divergence is less than the threshold, a high measure of similarity exists and the distributions are considered to be a match, at block 56 the attribute of the matching probability density function is then assigned, each attribute in the dataset is compared with all of the known probability density functions in an attempt to identify the attribute of the dataset and determine which attributes should be anonymized); and SUBSTITUTE SPECIFICATION - CLEAN COPY P201900561U501 wherein the determined distribution comprises one of a normal distribution or a discrete distribution, and the distance is determined according to a Kuliback-Leibler distance ([0038], anonymizer then maps a distribution of the data values for each attribute in the dataset, a divergence is measured between a dataset attribute and each known probability density function, the divergence can be measured using Kullback-Leibler divergence, Jensen-Shannon divergence, or variational distance to measure a distance between the distribution of a known probability density function and the distribution of a dataset attribute).

Claims 3-4, 10-11, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Freudiger (US 2017/0124336), in view of Griffith (US 2018/0314705), and further in view of Duchon (US 2011/0106743).
Regarding Claims 3, 10, and 17, Freudiger, as modified by Griffith, teaches all aspects of the claimed invention as disclosed in Claims 2, 9, and 16 above. The combination, particularly Griffith further teaches wherein matching proportion of a restriction and an expected mean are determined by: determining a mean restriction matching proportion of the plurality of data strings; determining an expected mean restriction matching proportion; and generating a score indicating a correspondence between the restriction matching proportion and the expected restriction matching proportion ([0176], non-compliant data attribute may be referred to as a data attribute that may be non-compliant with one or more values set forth in the analyzation data, a detected numeric value (~mean restriction matching proportion) that is more than 4 standard deviations from a mean value for a subset of data (~expected mean) may be deemed “an outlier” or “out-of-range,” and, thus, deemed non-compliant with a range of valid numeric values).
The combination fails to teach determining an expected mean using values obtained randomly from a domain of values. 
In the same field of endeavor, Duchon teaches determining an expected mean using values obtained randomly from a domain of values ([0187-0188], predictability is a measure of each model's ability to predict its own set of identified topics, three kinds of measures of accuracy were taken to compare the predicted and actual topics on a test day, to compute predictability, the modeled predictions were compared to those based on taking a random sample of D historical days to ensure that successful predictions were not simply a reflection of the same topics appearing over again).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the classification based on comparison with a dictionary of entries specifying string formats and rules to follow when matching attributes, as taught in Freudiger, modified by Griffith, to further include determination of expected values from a random sample of historical data values, as taught in Duchon, in order to ensure predictions are an accurate reflection of data to be expected. (See Duchon [0188]) 
Regarding Claims 4, 11, and 18, Freudiger, as modified by Griffith and Duchon, teaches all aspects of the claimed invention as disclosed in Claims 3, 10, and 17 above. The combination, particularly Griffith further teaches wherein the threshold amount is four standard deviations ([0176], non-compliant data attribute may be referred to as a data attribute that may be non-compliant with one or more values set forth in the analyzation data, a detected numeric value that is more than 4 standard deviations from a mean value for a subset of data (e.g., a column of data) may be deemed “an outlier” or “out-of-range,” and, thus, deemed non-compliant with a range of valid numeric values).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Rudrabhatla (US 2020/0134198) discloses data classification analysis performed on sampled data to obtain derived data characteristics, the sampled data (i.e., the data obtained in step 222) is then processed (e.g., parsed) in order to divide (or segment that data) into portions, the individual portions or groups of portions are then classified using pattern matching rules and/or pattern matching models that specify a given pattern and a corresponding data characteristic ([0053]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARGARET G MASTRODONATO whose telephone number is (571)270-7803. The examiner can normally be reached M-F 9:00-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Appiah can be reached on (571) 272-7904. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARGARET G MASTRODONATO/Primary Examiner, Art Unit 2641