> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# label_categories

> Relabel categories based on the top terms in each category. 

This function enables the relabeling of category labels based on the most significant terms, or `top_terms`, within
each category. It takes two columns as inputs: one with the `old_labels`, which can be single or multi-valued categories,
and one with the `top_terms` for each data point. The replacement of the labels is influenced by the specified rank method,
which can be `TFIDF`, `BACKGROUND`, `FOREGROUND`, `UPLIFT`, `ORDINAL`, or `ALPHANUM`, and the number of top terms considered
(specified by `top_n`).

## Usage

The following examples show how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      To replace labels in a column of categories using TFIDF:

      ```stan theme={null}
      label_categories(ds.old_labels, ds.top_terms) -> (ds.new_labels)
      ```
    </Tab>

    <Tab title="Example 2">
      To replace labels in a column of categories using BACKGROUND:

      ```stan theme={null}
      label_categories(ds.old_labels, ds.top_terms, {
        rank_method: 'BACKGROUND',
        top_n: 3
      }) -> (ds.new_labels)
      ```
    </Tab>

    <Tab title="Example 3">
      To replace labels in a column of categories using BACKGROUND and ascending order:

      ```stan theme={null}
      label_categories(ds.old_labels, ds.top_terms, {
        rank_method: 'BACKGROUND',
        top_n: 3,
        ascending: true
      }) -> (ds.new_labels)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      label_categories(old_labels: category|list, top_terms: category|text|list, {
          "param": value,
          ...
      }) -> (new_labels: column)
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="old_labels" type="column[category|list]" required>
    A column containing the old labels, which could be single-value or multi-value categories.
  </ParamField>

  <ParamField path="top_terms" type="column[category|text|list]" required>
    A column containing lists of top terms for each data point.
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="new_labels" type="column" required>
    The output column. Its data type will depend on the 'old\_labels' input column type.
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="rank_method" type="string" default="TFIDF">
    The method used to rank the top terms.

    Values must be one of the following:

    `TFIDF` `BACKGROUND` `FOREGROUND` `UPLIFT` `ORDINAL` `ALPHANUM`
  </ParamField>

  <ParamField path="top_n" type="integer" default="4">
    The number of top terms considered for each label.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ top_n < inf
    ```
  </ParamField>

  <ParamField path="ascending" type="boolean" default="false">
    Whether the terms should be sorted in ascending order.
  </ParamField>
</Accordion>
