top_terms, within
each category. It takes two columns as inputs: one with the old_labels, which can be single or multi-valued categories,
and one with the top_terms for each data point. The replacement of the labels is influenced by the specified rank method,
which can be TFIDF, BACKGROUND, FOREGROUND, UPLIFT, ORDINAL, or ALPHANUM, and the number of top terms considered
(specified by top_n).
Usage
The following examples show how the step can be used in a recipe.Examples
Examples
- Example 1
 - Example 2
 - Example 3
 - Signature
 
To replace labels in a column of categories using TFIDF:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced
by name e.g. "churn-clf").
Inputs
Inputs
Outputs
Outputs
The output column. Its data type will depend on the ‘old_labels’ input column type.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output).
Parameters
Parameters
The method used to rank the top terms.Values must be one of the following:
TFIDF BACKGROUND FOREGROUND UPLIFT ORDINAL ALPHANUMThe number of top terms considered for each label.Values must be in the following range:
Whether the terms should be sorted in ascending order.