Skip to main content
Affected categories are replaced with the missing value (NaN).

Usage

The following examples show how the step can be used in a recipe.

Examples

  • Example 1
  • Example 2
  • Signature
To remove categories ocurring fewer than 2 times in the column cat_col:
trim_frequencies(ds.cat_col, {"freq_min": 2}) -> (ds.cat_trimmed)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
input
column[category|list[category]]
required
A categorical column to trim.
output
column
required
A categorical column with fewer categories than the input.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

n_most_common
[integer, null]
The number N indicating how many of the most common values to filter (in descending order).Values must be in the following range:
0n_most_common < inf
freq_min
[integer, null]
Values with a lower frequency (count) than this will be removed.Values must be in the following range:
1freq_min < inf
freq_max
[integer, null]
Values with a higher frequency (count) than this will be removed.Values must be in the following range:
1freq_max < inf
I