trim_frequencies
Remove values whose frequencies (counts) are above/below a given threshold.
Affected categories are replaced with the missing value (NaN).
Usage
The following examples show how the step can be used in a recipe.
To remove categories ocurring fewer than 2 times in the column cat_col
:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
The number N indicating how many of the most common values to filter (in descending order).
Values must be in the following range:
Values with a lower frequency (count) than this will be removed.
Values must be in the following range:
Values with a higher frequency (count) than this will be removed.
Values must be in the following range:
Was this page helpful?