discretize_on_values
Discretize column by binning its values using explicitly specified cuts points.
Each bin can optionally be assigned a label.
Usage
The following example shows how the step can be used in a recipe.
The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins (0, 0.33], (0.33, 0.66], (0.66, 1] (i.e. right-inclusive). The bins will be labelled “low”, “medium”, and “high” respectively.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Points/values used to cut the quantitative column into bins.
Names for the resulting bins. Important: Note that cutting a series of values in 3 places creates 4 bins..
Whether to automatically include the minimum and maximum values of the column as cut points.
Whether the intervals are right-inclusive, i.e. of the form (x1, x2]
, or left-inclusive [x1, x2)
.
Was this page helpful?