DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 2022-02-22 has been entered.  Applicant’s amendments to the Specification have overcome the objection to the Title and properly corrected the contradiction in Para [0033] that would render the system not properly functional.
Examiner notes that Applicant’s amendment to the independent claims, “wherein the continuous data type refers to the first data of which magnitudes of values are meaningful, and the discrete data type refers to the first data of which magnitudes of the values are meaningless”, is already how the Examiner was interpreting Applicant’s use of the terms “continuous” and “discrete”, as explained in the previous Non-Final Office Action mailed 2021-12-07, and thus had no impact on the mapping of prior art in rejections under 35 USC 103.
The status of the claims is as follows:
Claims 1-20 remain pending in the application
Claims 1 and 11 are amended.
Response to Arguments
Applicant's arguments in response to rejections under 35 USC 112(a) and (b) have been fully considered but they are not persuasive. Applicant argues on Remarks Pages 11-12 that “Therefore, even though no details are given in the Specification on how one could make any sort of determination based on this, one of ordinary skill in the art still would be enabled to 
Applicant's arguments in response to rejections under 35 USC 103 have been fully considered but they are not persuasive.  Applicant argues on Remarks Page 13 that “In other 
Applicant argues on remarks Page 13 that “On the other hand, Valera merely discloses that the continuous variables can be distinguished … In contrast, the present application uses the continuous data type and the discrete data type to distinguish the first factors, so as to find In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

Examiner Note – Interpretation of Terms
The terms “continuous” and “discrete” are well-known and have well-established meanings in the art.  A continuous set is not countable, meaning that given one value, one cannot possibly identify the “next” value, because given any two values in the continuous set, there always exists an infinite number of values in between them.  A discrete set, on the other hand, is countable.  
For the following, refer to Laerd Statistics (“Types of Variable”) Pages 2-3 and Stat Trek (“Scales of Measurement”) Pages 1-2.  In statistics, there are “levels” of variables:
Nominal  (Identity)
Ordinal    (Identity, Magnitude)
Interval    (Identity, Magnitude, Equal Intervals)
Ratio         (Identity, Magnitude, Equal Intervals, Minimum Value of Zero)
One of ordinary skill in the art will appreciate that under the widely accepted mathematical definitions of “discrete” and “continuous”, the following is true:
Discrete:       Nominal and Ordinal
Continuous:  Interval and Ratio
However, Applicant recites in Claims 1 and 11: “wherein the continuous data type refers to the first data of which magnitudes of values are meaningful, and the discrete data type refers to the first data of which magnitudes of the values are meaningless.”  Thus, Applicant appears to be splitting data types into “magnitude meaningful” and “magnitude meaningless”.  Thus, Applicant appears to be redefining “discrete” and “continuous” as:
“Discrete”:  Nominal
“Continuous”:   Ordinal, Interval, and Ratio
However, this assumes that “magnitude is meaningless” means that “magnitude does not exist”.  It could also mean that the magnitude exists, but does not represent a quantitative measurement in which the difference between values has any meaning (see 112(b) rejections below). For example, for an Ordinal variable, a level of satisfaction from 1 to 5 has a magnitude, but the difference between values has no meaning.  As stated by Stat Trek, “With an interval scale, you know not only whether different values are bigger or smaller, you also know how much bigger or smaller they are.”  Under this interpretation of “magnitude is meaningless”, Ordinal would be “discrete”, and Applicant would be using the conventionally accepted definitions of “discrete” and “continuous”.  In the Instant Specification, the Applicant only gives Interval and Ratio examples for “continuous” (“production rate, yield, time, temperature, size”) and only gives Nominal examples for “discrete” (“machine number, gender”).  Thus it is not unambiguously clear how Applicant is classifying Ordinal variables (e.g., a level of satisfaction on a scale of 1-5).

	Therefore, Examiner is interpreting Applicant’s definitions of “discrete” and “continuous” as:
“Discrete”:  Nominal
“Continuous”:   Ordinal, Interval, and Ratio
For the purposes of precision and clarity, in the remainder of this Office Action, Examiner will herein use the following terms to avoid confusion with the commonly accepted definitions of “discrete” and “continuous”:
“Nominal”:  Applicant’s definition of “discrete”
“Magnitudinous”:  Applicant’s definition of “continuous”

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.


The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 4 and 14 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.  
The claims recite:  “generates a third detection result for each of the first factors by analyzing a discontinuity of the first data corresponding to each of the first factors by a LabelEncoder, each of the third detection results is one of the continuous data type and the discrete data type”.  The Instant Specification states from Page 12 Line 20 to Page 13 Line 11:  “The third detection technology examines a discontinuity of the first data corresponding to each of the first factors. Specifically, the processor 13 generates a third detection result D3 for each of the first factors by analyzing a discontinuity of the first data corresponding to each of the first factors by a LabelEncoder. Each of the third detection results D3 is the data type of the corresponding first factor (i.e., the continuous data type or the discrete data type).  If the processor 13 determines, via the LabelEncoder, that the first data corresponding to a first factor has discontinuous values, the third detection result D3 of the first factor is the continuous data type (which is represented by the digit "0" in FIG. 1C). If the processor 13 
LabelEncoder is a product produced by scikit-learn to “Encode target labels with value between 0 and n_classes-1.” (see LabelEncoder documentation).  It is unclear how one would “determines, via the LabelEncoder, that the first data corresponding to a first factor has discontinuous values”, or “determines, via the LabelEncoder, that the first data corresponding to a first factor is continuous”.  LabelEncoder merely maps a list of labels to a normalized list of labels (e.g., (“A”, “B”, “C”, “B”) -> (0, 1, 2, 1), or (“3.14159”, “4”, “3.14159”, “5.333”) -> (0, 1, 0, 2)).  No details are given in the Specification on how one could make any sort of determination based on this, and thus one of ordinary skill in the art is not enabled to make and/or use the invention without undue experimentation.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject 
The terms “meaningful” and “meaningless” in claims 1 and 11 are relative terms which render the claim indefinite. The terms “meaningful” and “meaningless” are not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  As explained in the “Examiner Note – Interpretation of Terms” section above, it is unclear to Examiner if “magnitude is meaningless” means that “magnitude does not exist” or that “magnitude exists, but does not represent a quantitative measurement in which the difference between values has any meaning”.  Examiner is interpreting as “magnitude does not exist”, also as described above in “Examiner Note – Interpretation of Terms”.
Dependent claims 2-10 and 12-20 are rejected because they inherit the deficiencies of Claims 1 and 11.
Claims 4 and 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  As described above in the 112(a) rejection, it is unclear what is meant by “analyzing a discontinuity…by a LabelEncoder” as recited in Claim 4 and in Instant Specification Pg. 13 Lines 1-2.  LabelEncoder is a code library to “Encode target labels with value between 0 and n_classes-1.”  It is unclear how one analyzes a discontinuity by a LabelEncoder.  Thus, the metes and bounds of the claim cannot be determined because it is unclear how to use the LabelEncoder as claimed.
Examiner Note - 35 USC § 103
Examiner notes that in the 103 rejection below, the Fisher reference was made available on arXiv on 4 Jan 2018, as shown on the left margin of the PDF included with this office action.  Applicant may note the 18 November 2018 date under the title.  Examiner contacted the authors of the paper and confirmed 4 Jan 2018 is correct, and the later date under the title is due to an issue with the markup language LaTeX, as the authors used a “current date” tag instead of hard coding the date.  The date changed when arXiv archived the compiled PDF, and then recompiled the markup into a new PDF when the old version was requested again.

Claim Rejections - 35 USC § 103
Claims 1, 8-11, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Valera et. al. (“Automatic Discovery of the Statistical Types of Variables in a Dataset”; hereinafter “Valera”) in view of Fisher et. al. (“Model Class Reliance: Variable Importance Measures for any Machine Learning Model Class, from the “Rashomon” Perspective; hereinafter “Fisher”)
As per Claim 1, Valera teaches a storage, being configured to store a plurality of first historical records and store a plurality of second historical records of the operating environment, each of the first historical records comprising a plurality of first data corresponding to a plurality of first factors one-to-one, and each of the second historical records comprising a plurality of second data corresponding to a plurality of second factors one-to-one (Valera, Page 9, Acknowledgement, discloses:  “The code implementing the proposed method, as well as the scripts that reproduce the experiments presented in the paper, are publicly available at: https://github.com/ivaleraM/DataTypes”  Here, Valera discloses “code”, and thus necessarily implies the use of a computer with storage and a processor to execute.  Valera, Page 7 Section 4.2, discloses:  “In this section, we evaluate the performance of the proposed method on seven real datasets collected from the UCI machine learning repository (Lichman, 2013). Table 1 summarizes theses datasets by providing the number of objects and attributes in the dataset, as well as how many of these attributes are discrete. In order to quantitatively evaluate the performance of the proposed method, we select at random 10% of the observations in each dataset as a held-out set and compare the predictive performance, in terms of average test log-likelihood per observation, of our method with a baseline method.”  Here, Valera discloses historical data (“seven real datasets collected from the UCI machine learning repository”), as well as first and second historical records as a “held-out set” is a second set of historical records separated from a first set of historical records.  These sets of data are data corresponding to factors one-to-one, as Valera discloses the data comprises “attributes”, which are factors.)
and a processor electrically connected to the storage, being configured to generate a first detection result for each of the first factors by analyzing a first dissimilarity degree of the first data corresponding to each of the first factors, each of the first detection results being one of a continuous data type and a discrete data type, wherein the continuous data type refers to the first data of which magnitudes of values are meaningful, and the discrete data type refers to the first data of which magnitudes of the values are meaningless (Valera, Page 9, Acknowledgement, discloses:  “The code implementing the proposed method, as well as the scripts that reproduce the experiments presented in the paper, are publicly available at: https://github.com/ivaleraM/DataTypes”  Here, Valera discloses “code”, and thus necessarily implies the use of a computer with storage and a processor to execute.  Valera, Page 2 Right Column Second Paragraph, discloses:  “In contrast, in this paper we proposed a general method that allows us to distinguish among real-valued, positive real-valued and interval data as types of continuous variables, and among categorical, ordinal and count data as types of discrete variables.”  Here, Valera discloses generating a detection result with each of the detection results being one of a nominal, with magnitudes meaningless (“categorical”) and magnitudinous, with magnitude meaningful (“real-valued, positive real-valued, and interval…ordinal and count”) data type, as Valera discloses a detection result of subtypes comprising these two types.  This is done by analyzing a dissimilarity degree of the data, which includes the “first data”, as nominal and magnitudinous data are dissimilar to each other, and Valera exploits this fact to “distinguish among” them.  “Distinguishing among” entities requires analyzing some dissimilarity among them, as if they were exactly the same, they could not be distinguished.)
 wherein the processor further trains a data type recognition model according to the first historical records and the first detection results (Valera, Page 2 Right Column Second Paragraph, as explained above, discloses distinguishing among data types according to the first historical records and the first detection results.  Valera, Page 2 Section 3, begins:  “In this section, we introduce a Bayesian method to determine the statistical type of variable that corresponds to each of the attributes describing the objects in an observation matrix X. In particular, we propose a probabilistic model”.  Here, Valera discloses a machine learning method (“Bayesian method”) being a “model”.  A machine learning method requires training.)
determines a data type of each of the second factors by using the data type recognition model to analyze the second data corresponding to each of the second factors (Valera, Page 2 Right Column, as explained above, discloses determining a data type using a data type recognition model.  Valera, Page 7 Section 4.2, as explained above, discloses running the model on a “held-out set”, and thus discloses analyzing the second data corresponding to each of the second factors.)
However, Valera does not teach establishes a basic prediction model by a first subset of the second historical records and the data types; generates a first comparison set by rearranging the second data corresponding to a first specific factor in the first subset; establishes a first comparison prediction model by the first comparison set and the data types; obtains a basic accuracy by using a second subset of the second historical records to test the basic prediction model; obtains a first accuracy by using the second subset to test the first comparison prediction model; and determines a first degree of importance of the first specific factor by comparing the basic accuracy with the first accuracy.
For the below section, refer to Fisher Page 4 Section 3.1:  “To describe the reliance of a model f on the random variable X1 in a population, we use the notion of a “switched” loss. Let Z(a) = (Y(a);X1(a);X2(a)) and Z(b) = (Y(b);X1(b);X2(b)) be independent random variables, each following the same distribution as Z = (Y;X1;X2). Denote realizations of Z(a) and Z(b) by z(a) = (y(a); x1(a); x2(a)) and z(b) = (y(b); x1(b); x2(b)) respectively. Given the realizations z(a) and z(b), let hf (z(a); z(b)) be the loss of model f on z(b), if x1(b) was first replaced with x1(a)”.  
Fisher teaches establishes a basic prediction model by a first subset of the [second historical] records and the data types (Recall that Valera teaches second historical records.  Fisher, Page 4 Section 3.1, discloses the establishment of a basic prediction model (“model f”) by a first subset of records (each “(Y;X1;X2)”), and each piece of data from X1 and X2 must be of a particular data type, as all data has a data type.  A “subset” may be anything from one element to the entire set, and thus is also disclosed a first subset being used to establish the model.)
generates a first comparison set by rearranging the [second] data corresponding to a first specific factor in the first subset (Recall that Valera teaches second data.  Fisher, Page 4 Section 3.1, discloses rearranging the data of a first specific factor (“x1(b) was first replaced with x1(a)”).
establishes a first comparison prediction model by the first comparison set and the data types (Fisher, Page 4 Section 3.1, discloses:  “For a given prediction function f, we wish to know the expectation of this quantity across pairs in the population, eswitch(f) := Ehf(Z(a);Z(b))”.  Here, Fisher discloses establishing a comparison prediction model by running the model across the population with the switched data described above, in order to calculate a loss.)
obtains a basic accuracy by using a second subset of the [second historical] records to test the basic prediction model (Recall Valera discloses second historical records.  Fisher, Page 4 Section 3.1, discloses:  “As a reference point, we compare eswitch(f) against the standard expected loss when none of the variables are switched, eorig(f) := Ehf(Z(a);Z(a)) = EL(f;Z)”.  Here, Fisher discloses obtaining a basic accuracy by using a second subset of the records, as several examples are needed to get an “expected” loss, and thus a subset consisting of some or all of the original set has been used.  This is using the basic prediction model, as this is done “when none of the variables are switched”).
obtains a first accuracy by using the second subset to test the first comparison prediction model (Fisher, Page 4 Section 3.1, discloses:  “For a given prediction function f, we wish to know the expectation of this quantity across pairs in the population, eswitch(f) := Ehf(Z(a);Z(b))”.  Here, Fisher discloses obtaining the first accuracy, which is using the rearranged “switched” set eswitch.  This is using the same subset, the “second subset” as the basic accuracy, as Fisher continues:  “As a reference point, we compare eswitch(f) against the standard expected loss when none of the variables are switched”).
and determines a first degree of importance of the first specific factor by comparing the basic accuracy with the first accuracy (Fisher, Page 4 Section 3.1, discloses:  “As a reference point, we compare eswitch(f) against the standard expected loss when none of the variables are switched”.  Here, the basic accuracy (“standard expected loss”) is compared with the first accuracy (“eswitch”)).
Valera and Fisher are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the data type prediction of Valera with the variable importance measure of Fisher.  One would be motivated to do so to gain operating efficiency by maintaining only one algorithm to measure variable importance across all machine learning models in an organization, regardless of architecture (Fisher, Abstract:  “Thus, MCR describes reliance on a 

As per Claim 8, the combination of Valera and Fisher teaches the apparatus of Claim 1.  Valera teaches second data (see Rejection to Claim 1).   Fisher teaches generates a second comparison set by rearranging the [second] data corresponding to a second specific factor in the first subset (Fisher, Page 18 Paragraph 2, discloses:  “We analyze a dataset curated by ProPublica, of 6,216 defendants from Broward County, Florida (Larson et al., 2016). The outcome of interest is an indicator of 2-year recidivism. Of the available covariates, we consider 5 variables which we refer to as “admissible.” These variables describe an individual’s age, their prior record, and the severity of the current charge. We also consider two variables which we refer to as “inadmissible,” an individual’s race (categorical) and sex (see Table 1). To answer the above questions, we compute the empirical MR, AR, and MCR on (1) all admissible variables, and on (2) all inadmissible variables.”  Here, Fisher discloses repeating their process, which involves permuting a feature in the dataset, on a second specific factor of the first subset, the first factor being one of “admissible variables”, the second being one of “inadmissible variables”.)
establishes a second comparison prediction model by the second comparison set and the data types (Fisher repeats the process from Page 4 Section 3.1:  “For a given prediction function f, we wish to know the expectation of this quantity across pairs in the population, eswitch(f) := Ehf(Z(a);Z(b))”.  Here, Fisher discloses establishing a comparison prediction model by running the model across the population with the switched data described above, in order to calculate a loss.  This is based on a second comparison set (permuting the “inadmissible” feature) and therefore establishes a second comparison prediction model.)
obtains a second accuracy by using the second subset to test the second comparison prediction model (Fisher repeats the process from Page 4 Section 3.1:  “For a given prediction function f, we wish to know the expectation of this quantity across pairs in the population, eswitch(f) := Ehf(Z(a);Z(b))”.  Here, Fisher discloses obtaining the second accuracy, which is using the rearranged “switched” set eswitch.  This is using the same subset, the “second subset” as the basic accuracy, as Fisher continues:  “As a reference point, we compare eswitch(f) against the standard expected loss when none of the variables are switched”. This is based on a second comparison set (permuting the “inadmissible” feature) and therefore based on a second comparison prediction model).
and determines a second degree of importance of the second specific factor by comparing the basic accuracy with the second accuracy. (Fisher repeats the process from Page 4 Section 3.1: “As a reference point, we compare eswitch(f) against the standard expected loss when none of the variables are switched”.  Here, the basic accuracy (“standard expected loss”) is compared with the second accuracy (“eswitch”)).
Valera and Fisher are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Valera and Fisher for at least the same reasons recited in Claim 1.

As per Claim 9, the combination of Valera and Fisher teaches the apparatus of Claim 8.  Fisher teaches calculates a first absolute difference between the basic accuracy and the first accuracy (Fisher, Page 21 Section 9.4, discloses:  “We choose our ratio-based definition of model reliance, MR(f) = eswitch(f) / eorig(f) , so that the measure can be comparable across problems, regardless of the scale of Y. However, several existing works define VI measures in terms of differences (Strobl et al., 2008; Datta et al., 2016; Gregorutti et al., 2017), analogous to
MRdifference(f) := eswitch(f) - eorig(f)
While this difference measure is less readily interpretable, it has several computational advantages.”  Here, Fisher discloses a difference between the basic accuracy and the first accuracy.  One of ordinary skill in the art will appreciate that this value will be positive, as rearranging the data so that it no longer properly correlates is expected to increase the error of a model, rather than improve it.  For support, see Fisher, who normally uses a ratio eswitch(f) / eorig(f), and that this ratio is bounded below by 1.0, indicating that eswitch(f) is larger.  Fisher shows this at the top of Page 19 on Figure 5:

    PNG
    media_image1.png
    292
    633
    media_image1.png
    Greyscale
 thus, eswitch(f) - eorig(f) is effectively an absolute difference as it is always positive.)
calculates a second absolute difference between the basic accuracy and the second accuracy (Fisher, Page 18 Paragraph 2, discloses:  “We analyze a dataset curated by ProPublica, of 6,216 defendants from Broward County, Florida (Larson et al., 2016). The outcome of interest is an indicator of 2-year recidivism. Of the available covariates, we consider 5 variables which we refer to as “admissible.” These variables describe an individual’s age, their prior record, and the severity of the current charge. We also consider two variables which we refer to as “inadmissible,” an individual’s race (categorical) and sex (see Table 1). To answer the above questions, we compute the empirical MR, AR, and MCR on (1) all admissible variables, and on (2) all inadmissible variables.”  Thus, Fisher repeats the process on a second accuracy, using an “inadmissible” variables).
determines that the first absolute difference is greater than the second absolute difference (Fisher, Bottom of Page 18, discloses:  “The empirical model reliance of f on admissible variables is 1.40, exceeding the empirical reliance of f on inadmissible variables, which is equal to 1.01.”  Here, Fisher discloses comparing the error comparisons.  While Fisher uses ratios here, as shown above, on Page 21 Section 9.4, Fisher discloses that the difference is effective also.  So here, Fisher discloses that one difference is greater than the other.)
determines that the first degree of importance is higher than the second degree of importance according to the determination result that the first absolute difference is greater than the second absolute difference (Fisher, Bottom of Page 18, discloses:  “Figure 5 shows the results of our analysis. We find that, overall, well-performing models may rely more heavily on admissible variables than on inadmissible variables”.  Here, Fisher discloses that the difference between the error differences is used to determine the degree of importance of a variable (“rely more heavily”)).
Valera and Fisher are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Valera and Fisher for at least the same reasons recited in Claim 1.

As per Claim 10, the combination of Valera and Fisher teaches the apparatus of Claim 1.  Valera teaches second data and second factors, and the use of a computer (see Rejection to Claim 1).   Fisher teaches a display electrically connected to the processor, being configured to display the second data corresponding to each of the second factors in a display mode corresponding to the data types of the second factors.  (Fisher, top of Page 19, Figure 5, discloses:

    PNG
    media_image2.png
    298
    631
    media_image2.png
    Greyscale

Fisher here discloses a display showing the results of the variable importance measures.  The display mode corresponds to the data types, as the data types are a fundamental property of the variables of the data, and the variables were used to calculate a measure of importance, which is shown on the graphs above.)
Valera and Fisher are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Valera and Fisher for at least the same reasons recited in Claim 1.

Claims 11 and 18-20 are method claims corresponding to apparatus claims 1 and 8-10, respectively.  They are rejected for the same reasons.

Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Valera and Fisher in view of Laerd Statistics (“Measures of Central Tendency”).
As per Claim 2, the combination of Valera and Fisher teaches the apparatus of Claim 1.  Valera teaches wherein the processor generates the first detection result corresponding to each of the first factors (see Rejection to Claim 1).  
generating a second comparison result by comparing a distinct count of the first data corresponding to the first factor with a second threshold; deciding the first detection result according to [the first comparison result and] the second comparison result.  (Valera, Page 2 Section 2, discloses:  “In order to distinguish between discrete and continuous variables, we can apply simple logic rules, e.g. count the number of unique values that the attribute takes and how many times we observe these attributes.”)  Here, Valera discloses “count the number of unique values”, which is a “distinct count”.  This implies the use of a threshold on which to make a decision based on this information, as Valera uses it to decide between continuous and discrete, the threshold being based on the number of attributes (“how many times we observe these attributes”).  Valera’s method discriminates between conventional discrete and continuous variables.  However, one of ordinary skill in the art will appreciate that if Valera’s method returns “continuous”, then one can definitely conclude it is magnitudinous, and Valera’s method is an effective step in a process to discriminate nominal from magnitudinous variables (“the first detection result”)).
However, the combination of Valera and Fisher does not explicitly teach generating a first comparison result by comparing a mode count of the first data corresponding to the first factor with a first threshold; and deciding the first detection result according to the first comparison result and the second comparison result.
Laerd Statistics teaches generating a first comparison result by comparing a mode count of the first data corresponding to the first factor with a first threshold; deciding the first detection result according to the first comparison result (Laerd Statistics, Page 6, discloses:  “Normally, the mode is used for categorical data”, and on Page 7 discloses:  “We are now stuck as to which mode best describes the central tendency of the data. This is particularly problematic when we have continuous data because we are more likely not to have any one value that is more frequent than the other. For example, consider measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is it that we will find two or more people with exactly the same weight (e.g., 67.4 kg)? The answer, is probably very unlikely - many people might be close, but with such a small sample (30 people) and a large range of possible weights, you are unlikely to find two people with exactly the same weight; that is, to the nearest 0.1 kg. This is why the mode is very rarely used with continuous data.”  Here, Laerd Statistics states that the mode is not useful for continuous data because “we are more likely not to have any one value that is more frequent than the other”.  Thus the mode count will always be 1 for a sample from a continuous distribution.  Thus a mode count of 1 can be used to conclude that a value is magnitudinous, which necessarily follows from being continuous.  Thus the mode count of 1 or not 1 is an effective step in a process to discriminate nominal from magnitudinous variables (“the first detection result”)).
The combination of Valera and Fisher with Laerd Statistics together suggests deciding the first detection result according to the first comparison result and the second comparison result.  (Valera suggests comparing a count of distinct values to a threshold to distinguish discrete from continuous, and Laerd Statistics suggests that the mode count of a sample from a continuous distribution is always 1.  Applying Valera’s concept of comparing to a threshold to Laerd Statistics’ mode count, one of ordinary skill in the art will appreciate that one or both of these concepts can be used to positively identify magnitudinous variables, and thus are effective steps in a process to make the decision for the first detection result between nominal and magnitudinous.)
Laerd Statistics and the combination of Valera and Fisher are analogous art because Laerd Statistics is reasonably pertinent to the problem faced by Valera and Fisher (see MPEP 2141.01(a)(I): “Rather, a reference is analogous art to the claimed invention if: (1) the reference is from the same field of endeavor as the claimed invention (even if it addresses a different problem); or (2) the reference is reasonably pertinent to the problem faced by the inventor (even if it is not in the same field of endeavor as the claimed invention).”
It would have been obvious before the effective filing date of the claimed invention to combine the type recognition model of Valera and Fisher with the mode count of Laerd Statistics.  One of ordinary skill in the art would be motivated to do so in order to have a more robust way of making the decision, with two decision points (the other being Valera’s method), where in the mode count provides a strong theoretical basis for distinguishing between nominal and magnitudinous distributions, due to the statistical properties of a sample taken from a continuous distribution (Laerd Statistics:  “This is particularly problematic when we have continuous data because we are more likely not to have any one value that is more frequent than the other.”)

Claim 12 is a method claim corresponding to apparatus claim 2.  It is rejected for the same reasons.

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Valera and Fisher in view of Minitab (“Assumptions and Normality”).
As per Claim 3, the combination of Valera and Fisher teaches the apparatus of Claim 1.  Valera teaches first historical records and first detection results (see Rejection to Claim 1).  However, the combination of Valera and Fisher does not explicitly teach wherein the processor further generates a second detection result for each of the first factors by comparing the first data corresponding to each of the first factor with a normal distribution model, each of the second detection results is one of the continuous data type and the discrete data type.
Minitab teaches wherein the processor further generates a second detection result for each of the first factors by comparing the first data corresponding to each of the first factor with a normal distribution model, each of the second detection results is one of the continuous data type and the discrete data type. (Minitab, Page 2, discloses:  “The underlying assumption, before performing a normality test, is that the data is continuous. When viewing discrete data, you lack information between any two integer values. This loss of information can make it hard to assess normality, i.e. that the underlying distribution really does resemble a bell curve with a specific mean and standard deviation. This bell curve assumes that you are looking at values between integers as well. Although it can make for a really nice histogram, it can make for disastrous results when performing a normality test.”  Here, Minitab discloses that discrete data will not properly correspond to a normal distribution model, but continuous data possibly can.  Thus, if a distribution passes a normality test, it must be continuous, and therefore one of ordinary skill in the art will appreciate that it must then be magnitudinous. Therefore a Normality test is an effective step in a process to discriminate nominal from magnitudinous variables (“the second detection result”)).
The combination of Valera and Fisher with Minitab further teaches wherein the processor trains the data type recognition model according to the first historical records, the first detection results, and the second detection results. (Valera, Page 2 Right Column Second Paragraph, discloses:  “In contrast, in this paper we proposed a general method that allows us to distinguish among real-valued, positive real-valued and interval data as types of continuous variables, and among categorical, ordinal and count data as types of discrete variables.” Valera continues:  “In this section, we introduce a Bayesian method to determine the statistical type of variable that corresponds to each of the attributes describing the objects in an observation matrix X. In particular, we propose a probabilistic model”.  In the preceding passages, Valera discloses a data type recognition model (“probabilistic model” that “distinguish” among “types” of variables), which requires training.  Valera, Page 7 Section 4.2, discloses:  “In this section, we evaluate the performance of the proposed method on seven real datasets collected from the UCI machine learning repository (Lichman, 2013)” and thus discloses historical records.  The training and results of the model depend on the data types of the historical records, and thus can be considered to be according to the first and second detection results, which are indications of the data types.  Applying Valera’s concept of determining the subtype of data with machine learning, it is obvious that the normality test can be used as a parameter in training of a machine learning model to determine if data is nominal or magnitudinous, wherein normality is indicative of magnitudinous data.)
Minitab and the combination of Valera and Fisher are analogous art because Minitab is reasonably pertinent to the problem faced by Valera and Fisher (see MPEP 2141.01(a)(I): “Rather, a reference is analogous art to the claimed invention if: (1) the reference is from the same field of endeavor as the claimed invention (even if it addresses a different problem); or (2) the reference is reasonably pertinent to the problem faced by the inventor (even if it is not in the same field of endeavor as the claimed invention).”
It would have been obvious before the effective filing date of the claimed invention to combine the type recognition model of Valera and Fisher with the normality test of Minitab.  One of ordinary skill in the art would be motivated to do so because, while a negative normality test would be inconclusive, a positive normality test can be a reliable indicator of a continuous variable, and can thus positively identify a magnitudinous variable (Minitab:  “The underlying assumption, before performing a normality test, is that the data is continuous. When viewing discrete data, you lack information between any two integer values. This loss of information can make it hard to assess normality, i.e. that the underlying distribution really does resemble a bell curve with a specific mean and standard deviation. This bell curve assumes that you are looking at values between integers as well. Although it can make for a really nice histogram, it can make for disastrous results when performing a normality test.”)

Claim 13 is a method claim corresponding to apparatus claim 3.  It is rejected for the same reasons.

Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Valera and Fisher in view of Scikit-Learn (“sklearn.preprocessing.LabelEncoder”).  (See 112(a)/(b) rejection.  Examiner is interpreting this limitation as using LabelEncoder to be a first step in the process described in Claims 2/12).
As per Claim 4, the combination of Valera and Fisher teaches the apparatus of Claim 1.  Valera teaches first historical records and first detection results (see Rejection to Claim 1).  However, the combination of Valera and Fisher does not explicitly teach wherein the processor further generates a third detection result for each of the first factors by analyzing a discontinuity of the first data corresponding to each of the first factors by a LabelEncoder, each of the third detection results is one of the continuous data type and the discrete data type
Scikit-Learn teaches wherein the processor further generates a third detection result for each of the first factors by analyzing a discontinuity of the first data corresponding to each of the first factors by a LabelEncoder, each of the third detection results is one of the continuous data type and the discrete data type (Scikit-Learn, Page 1, discloses:  “LabelEncoder can be used to normalize labels. It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.”  
Valera, as shown in the Rejection to Claim 2, discloses comparing a distinct count of the first data corresponding to the first factor with a second threshold in Page 2 Section 2:  “In order to distinguish between discrete and continuous variables, we can apply simple logic rules, e.g. count the number of unique values that the attribute takes and how many times we observe these attributes.”  Here, Valera discloses “analyzing a discontinuity”, as determining if a variable is continuous or “not continuous” is “analyzing a discontinuity”.  As shown in the rejection for Claim 2, if Valera’s method returns “continuous”, then one can conclude that the data is magnitudinous.  Thus, this is an effective step in a process to distinguish nominal from magnitudinous data (“third detection result”). One of ordinary skill in the art will appreciate that one can analyze the results of LabelEncoder to retrieve the distinct count, and proceed with Valera’s method.  LabelEncoder can simply be used as an efficient way to normalize and sort the values before analyzing the distinct count.  For example, one can analyze the distinct count more efficiently if the data is (1, 1, 1, 2, 2, 3) than if it is (34, 7, 9, 34, 7, 7), or even a collection of strings requiring multiple string comparisons).
The combination of Valera and Fisher with Scikit-Learn further teaches wherein the processor trains the data type recognition model according to the first historical records, the first detection results, and the third detection results. (Valera, Page 2 Right Column Second Paragraph, discloses:  “In contrast, in this paper we proposed a general method that allows us to distinguish among real-valued, positive real-valued and interval data as types of continuous variables, and among categorical, ordinal and count data as types of discrete variables.” Valera continues:  “In this section, we introduce a Bayesian method to determine the statistical type of variable that corresponds to each of the attributes describing the objects in an observation matrix X. In particular, we propose a probabilistic model”.  In the preceding passages, Valera discloses a data type recognition model (“probabilistic model” that “distinguish” among “types” of variables), which requires training.  Valera, Page 7 Section 4.2, discloses:  “In this section, we evaluate the performance of the proposed method on seven real datasets collected from the UCI machine learning repository (Lichman, 2013)” and thus discloses historical records.  The training and results of the model depend on the data types of the historical records, and thus can be considered to be according to the first and third detection results, which are indications of the data types.)
Scikit-Learn and the combination of Valera and Fisher are analogous art because Scikit-Learn is reasonably pertinent to the problem faced by Valera and Fisher (see MPEP 2141.01(a)(I): “Rather, a reference is analogous art to the claimed invention if: (1) the reference is from the same field of endeavor as the claimed invention (even if it addresses a different problem); or (2) the reference is reasonably pertinent to the problem faced by the inventor (even if it is not in the same field of endeavor as the claimed invention).”
It would have been obvious before the effective filing date of the claimed invention to combine the type recognition model of Valera and Fisher with the LabelEncoder of Scikit-Learn.  One of ordinary skill in the art would be motivated to do so because they would not have to write code to analyze the distinct values, but could have LabelEncoder do the preliminary work to make the code less complex to do so. For example, one can analyze the distinct count more efficiently if the data is (1, 1, 1, 2, 2, 3) than if it is (34, 7, 9, 34, 7, 7), or even a collection of strings requiring multiple string comparisons (Scikit-Learn:  “LabelEncoder can be used to normalize labels. It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.”)

Claim 14 is a method claim corresponding to apparatus claim 4.  It is rejected for the same reasons.

Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Valera and Fisher in view of Laerd Statistics (“Kruskal-Wallis H Test using SPSS Statistics”). 
As per Claim 5, the combination of Valera and Fisher teaches the apparatus of Claim 1.  Valera teaches first historical records and first detection results (see Rejection to Claim 1).  However, the combination of Valera and Fisher does not explicitly teach wherein the processor further generates a fourth detection result for each of the first factors by performing the following operations on each of the first factors: dividing the first data corresponding to the first factor into a plurality of data groups; calculating a measure of central tendency of each of the data groups; calculating a second dissimilarity degree among the measures of central tendency; deciding the fourth detection result according to the second dissimilarity degree, wherein the fourth detection result is one of the continuous data type and the discrete data type
Laerd Statistics teaches wherein the processor further generates a fourth detection result for each of the first factors by performing the following operations on each of the first factors: dividing the first data corresponding to the first factor into a plurality of data groups (Laerd Statistics, Pg 10, discloses:  “The Kruskal-Wallis H test (sometimes also called the "one-way ANOVA on ranks") is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups of an independent variable on a continuous or ordinal dependent variable”.  Laerd Statistics, Pg 12 gives an example:  “The researcher then recruits a group of 60 individuals with a similar level of back pain and randomly assigns them to one of three groups – Drug A, Drug B or Drug C treatment groups”.  In these passages, Laerd Statistics discloses dividing the data into a plurality of data groups.)
calculating a measure of central tendency of each of the data groups (Laerd Statistics, Pg 12, discloses:  “If your distributions have the same shape, you can use SPSS Statistics to carry out a Kruskal-Wallis H test to compare the medians of your dependent variable (e.g., "engagement score") for the different groups of the independent variable you are interested in”.  Here, Laerd Statistics discloses calculating the median of each group, and the median is a measure of central tendency).
calculating a second dissimilarity degree among the measures of central tendency (Laerd Statistics, Pg 12, discloses:  “If your distributions have the same shape, you can use SPSS Statistics to carry out a Kruskal-Wallis H test to compare the medians of your dependent variable (e.g., "engagement score") for the different groups of the independent variable you are interested in”.  Here, Laerd Statistics discloses to “compare” the medians.  Also, Laerd Statistics, Pg 18, discloses: “Remember, the distribution of your data will determine whether you can report differences with respect to medians.”  Here, Laerd Statistics explicitly discloses calculating a dissimilarity degree among the measures of central tendency (“differences with respect to medians”).
The combination of Valera and Fisher with Laerd Statistics together suggests deciding the fourth detection result according to the second dissimilarity degree, wherein the fourth detection result is one of the continuous data type and the discrete data type. (Recall that Valera suggests a detection result is a subtype of one of the nominal data type and the magnitudinous data type.  Laerd Statistics, Pg 10, discloses:  “When you choose to analyse your data using a Kruskal-Wallis H test, part of the process involves checking to make sure that the data you want to analyse can actually be analysed using a Kruskal-Wallis H test.” Laerd Statistics, Pg 11, continues:  “Your dependent variable should be measured at the ordinal or continuous level.”  Here, Laerd Statistics discloses that the Kruskal-Wallis test only works if data is at the ordinal or continuous level – which exactly corresponds with Applicant’s magnitudinous data.  This means it does not work for nominal data, which does not have an order (i.e., 1,2,3,4,5 are “ordinal” if they represent a pain scale, but “nominal” if they simply represent identifications such as a Person ID).  Laerd Statistics, Pg 12, further explains:  “Having similar distributions simply allows you to use medians to represent a shift in location between the groups.”  Nominal data will not even have a proper distribution with a median, since the median is irrelevant on this data.  Thus, it is highly likely that grouping nominal data and running Kruskal-Wallis on it to compare the central tendencies will result in a large dissimilarity measure among those central tendencies.  This large dissimilarity (“dissimilarity…is obvious…discrete”, as described by Applicant in Instant Specification Page 14) is a very good indicator of a nominal variable.  Thus, this method is an effective step in a process to distinguish nominal from magnitudinous data.  Applying Valera’s concept of determining the subtype of data with machine learning, it is obvious that a difference among central tendencies among the data can be used as a parameter in training of a machine learning model to determine if data is nominal or magnitudinous, wherein a large difference is indicative of nominal data, and a small difference means a higher likelihood of magnitudinous data.)
The combination of Valera and Fisher with Laerd Statistics further teaches wherein the processor trains the data type recognition model according to the first historical records, the first detection results, and the fourth detection results. (Valera, Page 2 Right Column Second Paragraph, discloses:  “In contrast, in this paper we proposed a general method that allows us to distinguish among real-valued, positive real-valued and interval data as types of continuous variables, and among categorical, ordinal and count data as types of discrete variables.” Valera continues:  “In this section, we introduce a Bayesian method to determine the statistical type of variable that corresponds to each of the attributes describing the objects in an observation matrix X. In particular, we propose a probabilistic model”.  In the preceding passages, Valera discloses a data type recognition model (“probabilistic model” that “distinguish” among “types” of variables), which requires training.  Valera, Page 7 Section 4.2, discloses:  “In this section, we evaluate the performance of the proposed method on seven real datasets collected from the UCI machine learning repository (Lichman, 2013)” and thus discloses historical records.  The training and results of the model depend on the data types of the historical records, and thus can be considered to be according to the first and fourth detection results, which are indications of the data types.)
Laerd Statistics and the combination of Valera and Fisher are analogous art because Laerd Statistics is reasonably pertinent to the problem faced by Valera and Fisher (see MPEP 2141.01(a)(I): “Rather, a reference is analogous art to the claimed invention if: (1) the reference is from the same field of endeavor as the claimed invention (even if it addresses a different problem); or (2) the reference is reasonably pertinent to the problem faced by the inventor (even if it is not in the same field of endeavor as the claimed invention).”


Claim 15 is a method claim corresponding to apparatus claim 5.  It is rejected for the same reasons.

Claims 6-7 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Valera and Fisher in view of Arvai (“Fine tuning a classifier in scikit-learn”).
As per Claim 6, the combination of Valera and Fisher teaches the apparatus of claim 1.  Valera teaches second data, second factor, data type recognition model and the processor determines the data type of each of the second factors (see Rejection to Claim 1).  However, the combination of Valera and Fisher does not explicitly teach wherein the data type recognition model has a threshold, and the processor determines the data type of each of the second factors by performing the following operations on each of the second factors: calculating a data type recognition value by the data type recognition model and the second 
Arvai teaches wherein the [data type recognition] classifier model has a threshold, and the processor determines the [data type of each of the second factors] class by performing the following operations [on each of the second factors]: calculating a [data type recognition] value by the [data type recognition] classifier model [and the second data corresponding to the second factor]; and determining the [data type] class by comparing the [data type recognition] value with the threshold.  (Recall above that Valera teaches a data type recognition model, which is a classifier, as it classifies data into data types.  Arvai, “Page 2, discloses:  “Adjust the decision threshold using the precision-recall curve and the roc curve, which is a more involved method that I will walk through.”  Here, Arvai, who states in the title “fine tuning a classifier”, discloses a decision threshold for the classifier.)
Arvai and the combination of Valera and Fisher are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the data type prediction of Valera with the decision threshold adjustment of Arvai.  One would be motivated to do so to adjust the classifier to meet one’s specific goals by optimizing for precision or recall, that is, by minimizing false positives or false negatives depending on the operator’s intended goals.  This will allow the operator to produce more accurate results for their purposes (Arvai, Pg. 5:  “That classifier was optimized for precision. For comparison, to show how GridSearchCV selects the best classifier, the function call below returns a classifier optimized for recall. The grid might be similar to the grid above, the only 

As per Claim 7, the combination of Valera and Fisher teaches the apparatus of claim 1.  Valera teaches second factor and data type recognition (see Rejection to Claim 1).  However, the combination of Valera and Fisher does not explicitly teach wherein the processor further calculates a data type accuracy of each of the second factors according to the data type recognition value of each of the second factors and the threshold.
Arvai teaches wherein the processor further calculates a [data type] accuracy [of each of the second factors] according to the [data type recognition] value [of each of the second factors] and the threshold. (Arvai, “Page 2, discloses:  “Adjust the decision threshold using the precision-recall curve and the roc curve, which is a more involved method that I will walk through.”  Here, Arvai discloses calculating a data type accuracy (“precision-recall curve and the roc curve”), and this score is according to the threshold, as it changes as the threshold is changed.)
Arvai and the combination of Valera and Fisher are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Arvai and the combination of Valera and Fisher for at least the same reasons recited in Claim 6.

Claims 16-17 are method claims corresponding to apparatus claims 6-7, respectively.  They are rejected for the same reasons.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126     
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126