Skip to main content
Quantiles can be defined as an array of cut points (e.g., [0.25, 0.5, 0.75]) or as a number indicating the desired number of bins. Each bin can optionally be assigned a label.

Usage

The following examples show how the step can be used in a recipe.

Examples

  • Example 1
  • Example 2
  • Signature
The following parameters will discretize a column with values in the range [0, 1], producing a new categorical column with bins defined by the borders [0, 0.25), [0.25, 0.5), [0.5, 0.75) and [0.75, 1]. The bins will have labels “q1”, “q2”, “q3” and “q4” respectively.
discretize_on_quantiles(ds.price, {
    "quantiles": [0.25, 0.5, 0.75],
    "labels": ["q1", "q2", "q3", "q4"]
}) -> (ds.price_category)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
input
column[number]
required
A numeric column to discretize.
output
column[category]
required
A new categorical column with categories corresponding to discretized bins.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

quantiles
[integer, array[number]]
required
Quantiles. Defines the quantiles or number of bins for discretizing the column. Can be an array of cut points or a number indicating the number of bins.Values must be in the following range:
2quantiles < inf
Item
number
Each item in array.Values must be in the following range:
0Item1
  • 4
  • [0.25, 0.5, 0.75]
labels
array[string]
default:"['q1', 'q2', 'q3', 'q4']"
Names for the resulting bins. Needs one more label than the number of quantile cut points.
Item
string
Each item in array.
  • [‘Q1’, ‘Q2’, ‘Q3’, ‘Q4’]
I