discretize_on_quantiles
Discretize column into bins based on quantiles.
Quantiles can be defined as an array of cut points (e.g., [0.25, 0.5, 0.75]) or as a number indicating the desired number of bins. Each bin can optionally be assigned a label.
Usage
The following examples show how the step can be used in a recipe.
The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins defined by the borders [0, 0.25), [0.25, 0.5), [0.5, 0.75) and [0.75, 1]. The bins will have labels “q1”, “q2”, “q3” and “q4” respectively.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Quantiles. Defines the quantiles or number of bins for discretizing the column. Can be an array of cut points or a number indicating the number of bins.
Values must be in the following range:
Names for the resulting bins. Needs one more label than the number of quantile cut points.
Was this page helpful?