> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# label_texts_containing

> Categorize texts containing specific keywords with custom labels. 

Assigns each text to one or more categories. Each category is defined by a list of keywords a text must
include or exclude to be labelled accordingly. In addition, each category may specify whether a keyword
must be matched explicitly, ignoring its case (lower, upper) etc. See parameters below for further details.

## Usage

The following example shows how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      The following defines the keywords to be included or exluded for each of three categories, labelled "journalist", "business" and "CEO". Note how in the case of "CEO" we're looking for occurrences of the spelling with capitals only.

      ```stan theme={null}
      label_texts_containing(ds.text, {
        "journalists": {
          "include": ["journalist", "journalism", "news"],
          "exclude": ["blogger"],
          "case_sensitive": false
        },
        "business": {
          "include":["startup", "entrepreneur", "founder"]
        },
        "CEOs": {
          "include": ["CEO"],
          "case_sensitive": true
        }
      }) -> (ds.field_of_occupation)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      label_texts_containing(text_col: text|category, {
          "param": value,
          ...
      }) -> (labels: list[category])
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="text_col" type="column[text|category]" required>
    A text column to label.
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="labels" type="column[list[category]]" required>
    A column containing the labels assigned to each text.
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="Categories" type="object">
    One or more named text categories.
    Each parameter should be a key indicating the name/label to show for a specific text category,
    and should have an object as value specifying the terms a text must or must not contain for that
    particular label to apply. Also see examples above.

    <Accordion title="Properties">
      <ParamField path="include" type="array[string]">
        List of strings a text must include to apply a label.

        <Accordion title="Array items">
          <ParamField path="Item" type="string">
            Each item in array.
          </ParamField>
        </Accordion>
      </ParamField>

      <ParamField path="exclude" type="array[string]">
        List of strings a text must not include to apply a label.

        <Accordion title="Array items">
          <ParamField path="Item" type="string">
            Each item in array.
          </ParamField>
        </Accordion>
      </ParamField>

      <ParamField path="accent_sensitive" type="boolean" default="false">
        Whether to make search accent-sensitive.
      </ParamField>

      <ParamField path="case_sensitive" type="boolean" default="false">
        Whether to make search case-sensitive.
      </ParamField>

      <ParamField path="whole_words" type="boolean" default="true">
        Whether to match whole words only.
        If enabled, only matches a word if it is surrounded by non-alphanumeric characters.
      </ParamField>
    </Accordion>
  </ParamField>
</Accordion>
