Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.graphext.com/llms.txt

Use this file to discover all available pages before exploring further.

Affected categories are replaced with the missing value (NaN).

Usage

The following examples show how the step can be used in a recipe.

Examples

To remove categories ocurring fewer than 2 times in the column cat_col:
trim_frequencies(ds.cat_col, {"freq_min": 2}) -> (ds.cat_trimmed)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
input
column[category|list[category]]
required
A categorical column to trim.
output
column
required
A categorical column with fewer categories than the input.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

n_most_common
[integer, null]
The number N indicating how many of the most common values to filter (in descending order).Values must be in the following range:
0n_most_common < inf
freq_min
[integer, null]
Values with a lower frequency (count) than this will be removed.Values must be in the following range:
1freq_min < inf
freq_max
[integer, null]
Values with a higher frequency (count) than this will be removed.Values must be in the following range:
1freq_max < inf