Discretize on values¶
binning
Discretize column by binning its values using explicitly specified cuts points.
Each bin can optionally be assigned a label.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
discretize_on_values(input: number, {"param": value}) -> (output: category)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins (0, 0.33], (0.33, 0.66], (0.66, 1] (i.e. right-inclusive). The bins will be labelled "low", "medium", and "high" respectively.
discretize_on_values(ds.price, {"cuts": [0.33, 0.66], "add_extremes": true, "labels": ["low", "medium", "high"]}) -> (ds.price_category)
Inputs¶
input: column:number
A quantitative column to discretize.
Outputs¶
output: column:category
A new categorical column with categories corresponding to discretized bins.
Parameters¶
cuts: array[number] = [0, 0.5, 1]
Points/values used to cut the quantitative column into bins.
labels: array[string] = ['low', 'high']
Names for the resulting bins. Important: Note that cutting a series of values in 3 places creates 4 bins..
add_extremes: boolean = True
Whether to automatically include the minimum and maximum values of the column as cut points.
include_right: boolean = True
Whether the intervals are right-inclusive, i.e. of the form (x1, x2]
, or left-inclusive [x1, x2)
include_lowest: boolean = False
Whether the first interval should be left-inclusive or not.