Calculates all pair-wise column dependencies (by default mutual information).
Examples
ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Outputs
step(..., {"param": "value", ...}) -> (output)
.
Parameters
"mutual_information"
, which applies scikit-learn`s k-nearest neighbors implementations (see
here and
here).
It supports both categorical and quantitative variables, is relatively fast, but doesn’t have a natural upper bound
(i.e. values are not in the range [0, 1]).The "distance_correlation"
method also
supports both categorical and quantitative variables, and has a natural upper bound of 1. It’s relatively
slow though, so make sure to select a reasonable value for n_samples
."distance_correlation_fast"
uses an optimized implementation
of distance correlation, but only supports quantitative variables."pearson"
is the standard Pearson correlation coefficient,
which also only supports quantitative variables.Lastly, "predictive_power"
calculates a version of the predictive power score (PPS).
This essentially fits a decision tree to predict variable y using only variable x as a predictor, and measures
it performance relative to a dummy/baseline prediction. It supports both categorical and quantitative variables.Values must be one of the following:mutual_information
distance_correlation
distance_correlation_fast
pearson
predictive_power
null
will be
discarded (the default behavior). Can be set e.g. to 0, to generate all possible links.method
.