Skip to content

Discretize on quantiles

binning

Discretize column by binning its values using specified quantiles as cut points.

Each bin can optionally be assigned a label.

Example

The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins defined by the borders [0, 0.25), [0.25, 0.5), [0.5, 0.75) and [0.75, 1]. The bins will have labels "q1", "q2", "q3" and "q4" respectively.

discretize_on_quantiles(ds.price, {
    "quantiles": [0.25, 0.5, 0.75],
    "labels": ["q1", "q2", "q3", "q4"]
}) -> (ds.price_category)

Usage

The following are the step's expected inputs and outputs and their specific types.

discretize_on_quantiles(input: number, {"param": value}) -> (output: category)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Inputs


input: column:number

A numeric column to discretize.

Outputs


output: column:category

A new categorical column with categories corresponding to discretized bins.

Parameters


quantiles: array[number] = [0.25, 0.5, 0.75]

Quantiles (cut points). The quantiles used as cut points when creating bins of the input values, expressed as proportions in [0, 1].

Items in quantiles

item: number

Range: 0 ≤ item ≤ 1

Example parameter values:

  • [0.25, 0.5, 0.75]

labels: array[string] = ['q1', 'q2', 'q3', 'q4']

Names for the resulting bins. Important: Needs one more name than quantile cut points (3 cuts generate 4 bins)!

Example parameter values:

  • ["Q_1", "Q_2", "Q_3", "Q_4"]