Given a query of the form “word1; word2 OR word3”, texts containing “word1” will be labeled as “word1”, and texts containing “word2” or “word3” will be labeled as “word2 OR word3”. In other words, each semicolon-separated string acts as both query and corresponding label. Texts matching multiple queries will be assigned multiple labels.

query
string
required

Query to label. Query is a string of labels/categories and associated keywords (see examples below). Use ”;” to separate categories, “OR” to join words for a category, and ”-” to exclude words from a category. The category label(s) will be formed using the query, e.g. a text containing “AA” and “BB” will be tagged as [AA,BB].

accent_sensitive
boolean

Whether to make search accent sensitive.

case_sensitive
boolean

Whether to make search case sensitive.

whole_words
boolean
default: "true"

Whether to match whole words only. If enabled, only matches a word if it is surrounded by non-alphanumeric characters.

first_only
boolean

Whether to return only the first match. If True, only the first match will be assigned to each text. The result will be a simple categorical column. If False, all identified matches will be assigned to each text. The result will be a multivalued column containing lists of categories.