label_categories
Relabel categories based on the top terms in each category.
This function enables the relabeling of category labels based on the most significant terms, or top_terms
, within
each category. It takes two columns as inputs: one with the old_labels
, which can be single or multi-valued categories,
and one with the top_terms
for each data point. The replacement of the labels is influenced by the specified rank method,
which can be TFIDF
, BACKGROUND
, FOREGROUND
, UPLIFT
, ORDINAL
, or ALPHANUM
, and the number of top terms considered
(specified by top_n
).
Usage
The following examples show how the step can be used in a recipe.
To replace labels in a column of categories using TFIDF:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
The method used to rank the top terms.
Values must be one of the following:
TFIDF
BACKGROUND
FOREGROUND
UPLIFT
ORDINAL
ALPHANUM
The number of top terms considered for each label.
Values must be in the following range:
Whether the terms should be sorted in ascending order.
Was this page helpful?