Discretize on quantiles¶
binning
Discretize column by binning its values using specified quantiles as cut points.
Each bin can optionally be assigned a label.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
discretize_on_quantiles(input: number, {"param": value}) -> (output: category)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins defined by the borders [0, 0.25), [0.25, 0.5), [0.5, 0.75) and [0.75, 1]. The bins will have labels "q1", "q2", "q3" and "q4" respectively.
discretize_on_quantiles(ds.price, {
"quantiles": [0.25, 0.5, 0.75],
"labels": ["q1", "q2", "q3", "q4"]
}) -> (ds.price_category)
Inputs¶
input: column:number
A numeric column to discretize.
Outputs¶
output: column:category
A new categorical column with categories corresponding to discretized bins.
Parameters¶
quantiles: array[number] = [0.25, 0.5, 0.75]
Quantiles (cut points). The quantiles used as cut points when creating bins of the input values, expressed as proportions in [0, 1].
Items in quantiles
item: number
Range: 0 ≤ item ≤ 1
Example parameter values:
[0.25, 0.5, 0.75]
labels: array[string] = ['q1', 'q2', 'q3', 'q4']
Names for the resulting bins. Important: Needs one more name than quantile cut points (3 cuts generate 4 bins)!
Example parameter values:
["Q_1", "Q_2", "Q_3", "Q_4"]