Skip to content

Classify text

NLP · inference · classification · model · text · hugging face

Classify texts using any model from the Hugging Face hub.

Note that we do not validate the model name before executing it, so make sure it corresponds to an existing model in the hub, otherwise the step will fail.

Experimental

This function is still in the experimental stage and we do not guarantee it won't fail for some combination of model and parameters. Feel free to get in touch if you have problems using it.

Usage


The following are the step's expected inputs and outputs and their specific types.

Step signature
classify_text(text: text, {"param": value}) -> (class: category)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Example

To infer the ternary sentiment of tweets using a CardiffNLP model

Example call (in recipe editor)
classify_text(ds.text, {"model": "cardiffnlp/twitter-roberta-base-sentiment"}) -> (ds.text_sentiment)

Inputs


text: column:text

A column of texts to classify.

Outputs


class: column:category

The inferred class of each text. The labels of individual categories depend on the seleted model, and/or can be specified manually using the labels parameter (see below).

Parameters


model: string

The name of a model. This should be the full name (including the organization if applicable) of a model in the Hugging Face model hub. You can copy it by clicking on the icon next to the model's name on its dedicated web page.

Note that if the name doesn't correspond to a model existing in the hub the step will fail. Since there are hundreds if not thousands of potential models, we cannot validate if the name is correct before executing it.


labels: object | null

Map original model output to human-readable labels. Unfortunately, many models in Hugging Face are badly configured and output labels like LABEL_0, LABEL_1, etc. which isn't very helpful. You can use the "Hosted inference API" widget on the model's web page to test its output labels. If necessary, use this parameter to map the default output labels to ones you prefer.

Items in labels

*param: string

One or more additional parameters. Note that all parameters should have the same type.

Example parameter values:

  • E.g. to map ternary sentiment labels

    "labels": {
      "LABEL_0": "negative",
      "LABEL_1": "neutral",
      "LABEL_2": "positive"
    }
    

min_prob: number | null

Minimum probability (score) to accept prediction label. Class labels with a corresponding probability smaller than this value will be removed (replaced with NaN, i.e. the missing value).

Range: 0.0 < min_prob < 1.0


batch_size: integer

How many texts to process simultaneously. May get ignored when running on CPU.

Range: 1 ≤ batch_size ≤ 64


n_workers: integer

Number of threads used to feed GPU with texts.

Range: 1 ≤ n_workers ≤ 4


device: integer | null

Which CPU/GPU to run model on. Pass -1 to use CPU, and 0 to use first available GPU. By default, of when passed null, the step will use GPU automatically if one is found otherwise CPU.

Back to top