top_terms
, within
each category. It takes two columns as inputs: one with the old_labels
, which can be single or multi-valued categories,
and one with the top_terms
for each data point. The replacement of the labels is influenced by the specified rank method,
which can be TFIDF
, BACKGROUND
, FOREGROUND
, UPLIFT
, ORDINAL
, or ALPHANUM
, and the number of top terms considered
(specified by top_n
).
Usage
The following examples show how the step can be used in a recipe.Examples
Examples
- Example 1
- Example 2
- Example 3
- Signature
To replace labels in a column of categories using TFIDF:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
Outputs
Outputs
The output column. Its data type will depend on the ‘old_labels’ input column type.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
The method used to rank the top terms.Values must be one of the following:
TFIDF
BACKGROUND
FOREGROUND
UPLIFT
ORDINAL
ALPHANUM
The number of top terms considered for each label.Values must be in the following range:
Whether the terms should be sorted in ascending order.