thresholds parameter defines the minimum count or percentage below which a category is hidden. This is useful for decluttering filters when there are many infrequent categories.
This is a UI configuration step that affects how the project is displayed in Graphext. It applies to the dataset referenced in its inputs. If your recipe produces multiple datasets (e.g. a filtered dataset that is then passed to create_project alongside the original), you need to add separate configure steps for each dataset you want to configure.
Usage
The following example shows how the step can be used in a recipe.Examples
Examples
- Example 1
- Signature
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced
by name e.g. "churn-clf").
Inputs
Inputs
The column to configure.
Outputs
Outputs
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output).
Parameters
Parameters
A list of threshold configurations.
A categorical column can have two kinds of thresholds determining whether specific categories will be
hidden from its view in the UI: a minimum number of rows in the current selection below which a category
will be hidden, or a minimum number of rows in the whole dataset (everything).The
thresholds parameter should be a list containing 1 or 2 objects: the configuration of a selection
threshold, and/or the configuration of a threshold for everything.Options
Options
- array
- array
- array
- array
array.
Array items
Array items
Configure categories to be discarded (hidden) in terms of their occurrence in the whole dataset.
Categories with a number (or percentage) of rows in the whole dataset less than
value will be discarded (hidden from the variable’s filter view).Properties
Properties
Whether to apply the threshold to the current selection of rows or all rows in the dataset.
Whether to interpret the threshold value as an absolute (count) or percentage of rows.Values must be one of the following:
ABSOLUTEPERCENTAGE
Categories less frequent than this value will be discarded (hidden).