Discretize column into bins based on quantiles.
Quantiles can be defined as an array of cut points (e.g., [0.25, 0.5, 0.75]) or as a number indicating the desired number of bins. Each bin can optionally be assigned a label.
The following examples show how the step can be used in a recipe.
Examples
The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins defined by the borders [0, 0.25), [0.25, 0.5), [0.5, 0.75) and [0.75, 1]. The bins will have labels “q1”, “q2”, “q3” and “q4” respectively.
The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins defined by the borders [0, 0.25), [0.25, 0.5), [0.5, 0.75) and [0.75, 1]. The bins will have labels “q1”, “q2”, “q3” and “q4” respectively.
The following parameters will discretize a column into 4 equally sized bins.
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
A numeric column to discretize.
Outputs
A new categorical column with categories corresponding to discretized bins.
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Parameters
Quantiles. Defines the quantiles or number of bins for discretizing the column. Can be an array of cut points or a number indicating the number of bins.
Values must be in the following range:
Array items
Each item in array.
Values must be in the following range:
Examples
Discretize column into bins based on quantiles.
Quantiles can be defined as an array of cut points (e.g., [0.25, 0.5, 0.75]) or as a number indicating the desired number of bins. Each bin can optionally be assigned a label.
The following examples show how the step can be used in a recipe.
Examples
The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins defined by the borders [0, 0.25), [0.25, 0.5), [0.5, 0.75) and [0.75, 1]. The bins will have labels “q1”, “q2”, “q3” and “q4” respectively.
The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins defined by the borders [0, 0.25), [0.25, 0.5), [0.5, 0.75) and [0.75, 1]. The bins will have labels “q1”, “q2”, “q3” and “q4” respectively.
The following parameters will discretize a column into 4 equally sized bins.
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
A numeric column to discretize.
Outputs
A new categorical column with categories corresponding to discretized bins.
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Parameters
Quantiles. Defines the quantiles or number of bins for discretizing the column. Can be an array of cut points or a number indicating the number of bins.
Values must be in the following range:
Array items
Each item in array.
Values must be in the following range:
Examples