Skip to main content
Each bin can optionally be assigned a label.

Usage

The following example shows how the step can be used in a recipe.

Examples

  • Example 1
  • Signature
The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins (0, 0.33], (0.33, 0.66], (0.66, 1] (i.e. right-inclusive). The bins will be labelled “low”, “medium”, and “high” respectively.
discretize_on_values(ds.price, {"cuts": [0.33, 0.66], "add_extremes": true, "labels": ["low", "medium", "high"]}) -> (ds.price_category)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
input
column[number]
required
A quantitative column to discretize.
output
column[category]
required
A new categorical column with categories corresponding to discretized bins.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

cuts
array[number]
default:"[0.33, 0.66]"
Points/values used to cut the quantitative column into bins.
Item
number
Each item in array.
labels
array[string]
default:"['low', 'medium', 'high']"
Names for the resulting bins. Important: Note that cutting a series of values in 3 places creates 4 bins..
Item
string
Each item in array.
add_extremes
boolean
default:"true"
Whether to automatically include the minimum and maximum values of the column as cut points.
include_right
boolean
default:"true"
Whether the intervals are right-inclusive, i.e. of the form (x1, x2], or left-inclusive [x1, x2).
I