This function enables the relabeling of category labels based on the most significant terms, or top_terms, within each category. It takes two columns as inputs: one with the old_labels, which can be single or multi-valued categories, and one with the top_terms for each data point. The replacement of the labels is influenced by the specified rank method, which can be TFIDF, BACKGROUND, FOREGROUND, UPLIFT, ORDINAL, or ALPHANUM, and the number of top terms considered (specified by top_n).

Usage

The following examples show how the step can be used in a recipe.

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).