Skip to main content
This function enables the relabeling of category labels based on the most significant terms, or top_terms, within each category. It takes two columns as inputs: one with the old_labels, which can be single or multi-valued categories, and one with the top_terms for each data point. The replacement of the labels is influenced by the specified rank method, which can be TFIDF, BACKGROUND, FOREGROUND, UPLIFT, ORDINAL, or ALPHANUM, and the number of top terms considered (specified by top_n).

Usage

The following examples show how the step can be used in a recipe.

Examples

  • Example 1
  • Example 2
  • Example 3
  • Signature
To replace labels in a column of categories using TFIDF:
label_categories(ds.old_labels, ds.top_terms) -> (ds.new_labels)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
old_labels
column[category|list]
required
A column containing the old labels, which could be single-value or multi-value categories.
top_terms
column[category|text|list]
required
A column containing lists of top terms for each data point.
new_labels
column
required
The output column. Its data type will depend on the ‘old_labels’ input column type.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

rank_method
string
default:"TFIDF"
The method used to rank the top terms.Values must be one of the following:TFIDF BACKGROUND FOREGROUND UPLIFT ORDINAL ALPHANUM
top_n
integer
default:"4"
The number of top terms considered for each label.Values must be in the following range:
1top_n < inf
ascending
boolean
default:"false"
Whether the terms should be sorted in ascending order.
I