> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# classify_text

> Classify texts using any model from the [Hugging Face hub](https://huggingface.co/models). 

Note that we do not validate the model name before executing it, so make sure it
corresponds to an existing model in the hub, otherwise the step will fail.

## Usage

The following example shows how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      To infer the ternary sentiment of tweets using a CardiffNLP model

      ```stan theme={null}
      classify_text(ds.text, {"model": "cardiffnlp/twitter-roberta-base-sentiment"}) -> (ds.text_sentiment)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      classify_text(text: text, {
          "param": value,
          ...
      }) -> (class: category)
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="text" type="column[text]" required>
    A column of texts to classify.
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="class" type="column[category]" required>
    The inferred class of each text. The labels of individual categories depend on the seleted model,
    and/or can be specified manually using the `labels` parameter (see below).
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="model" type="string" required>
    The name of a model.
    This should be the full name (including the organization if applicable) of a model in the
    [Hugging Face model hub](https://huggingface.co/models). You can copy it by clicking on the
    icon next to the model's name on its dedicated web page.

    Note that if the name doesn't correspond to a model existing in the hub the step will fail.
    Since there are hundreds if not thousands of potential models, we cannot validate if the
    name is correct before executing it.
  </ParamField>

  <ParamField path="revision" type="[string, null]">
    The specific model version.
    Can be a branch name, a tag name, or a commit id. To identify a particular revision, on
    a model's webpage (such as [https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual)),
    browse to the [Files and versions tab](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual/tree/main),
    and use the branch or history dropdown menus to see the available branch names or commit IDs.
    If not provided, will use the latest (newest) available version (usually from the "main" branch).
  </ParamField>

  <ParamField path="labels" type="[object, null]">
    Map original model output to human-readable labels.
    Unfortunately, many models in Hugging Face are badly configured and output labels like `LABEL_0`,
    `LABEL_1`, etc. which isn't very helpful. You can use the "Hosted inference API"
    widget on the model's web page to test its output labels. If necessary, use this parameter
    to map the default output labels to ones you prefer.

    <Accordion title="Item properties">
      <ParamField path="Items" type="string">
        One or more additional parameters.
      </ParamField>
    </Accordion>

    <Accordion title="Examples">
      * E.g. to map ternary sentiment labels

      ```json theme={null}
      "labels": {
      "LABEL_0": "negative",
      "LABEL_1": "neutral",
      "LABEL_2": "positive"
      }
      ```
    </Accordion>
  </ParamField>

  <ParamField path="min_prob" type="[number, null]">
    Minimum probability (score) to accept prediction label.
    Class labels with a corresponding probability smaller than this value will be removed
    (replaced with NaN, i.e. the missing value).

    Values must be in the following range:

    ```javascript theme={null}
    0.0 < min_prob < 1.0
    ```
  </ParamField>

  <ParamField path="batch_size" type="integer" default="8">
    How many texts to process simultaneously.
    May get ignored when running on CPU.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ batch_size ≤ 64
    ```
  </ParamField>

  <ParamField path="n_workers" type="integer">
    Number of threads used to feed GPU with texts.

    Values must be in the following range:

    ```javascript theme={null}
    1 ≤ n_workers ≤ 4
    ```
  </ParamField>

  <ParamField path="device" type="[integer, null]">
    Which CPU/GPU to run model on.
    Pass -1 to use CPU, and 0 to use first available GPU. By default, of
    when passed `null`, the step will use GPU automatically if one is found
    otherwise CPU.
  </ParamField>

  <ParamField path="integration" type="string">
    ID of a Hugging Face integration configured in Graphext.
    To use a private model from the Hugging Face hub, you need to configure a
    Hugging Face "API Key" integration (in the relevant Graphext team > Add Integration

    > API KEYS > Add API Key > Hugging Face > paste an access token previously
    > configured in your huggingface account). Graphext will automatically assign
    > an ID to your integration which gets autocompleted where required (e.g. in the
    > recipe editor).
  </ParamField>
</Accordion>
