Skip to content

Discretize on values

binning

Discretize column by binning its values using explicitly specified cuts points.

Each bin can optionally be assigned a label.

Example

The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins (0, 0.33], (0.33, 0.66], (0.66, 1] (i.e. right-inclusive). The bins will be labelled "low", "medium", and "high" respectively.

discretize_on_values(ds.price, {"cuts": [0.33, 0.66], "add_extremes": true, "labels": ["low", "medium", "high"]}) -> (ds.price_category)

Usage

The following are the step's expected inputs and outputs and their specific types.

discretize_on_values(input: number, {"param": value}) -> (output: category)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Inputs


input: column:number

A quantitative column to discretize.

Outputs


output: column:category

A new categorical column with categories corresponding to discretized bins.

Parameters


cuts: array[number] = [0, 0.5, 1]

Points/values used to cut the quantitative column into bins.


labels: array[string] = ['low', 'high']

Names for the resulting bins. Important: Note that cutting a series of values in 3 places creates 4 bins..


add_extremes: boolean = True

Whether to automatically include the minimum and maximum values of the column as cut points.


include_right: boolean = True

Whether the intervals are right-inclusive, i.e. of the form (x1, x2], or left-inclusive [x1, x2)


include_lowest: boolean = False

Whether the first interval should be left-inclusive or not.