label_texts_containing_from_query
Label texts given an elastic-like query string.
Given a query of the form “word1; word2 OR word3”, texts containing “word1” will be labeled as “word1”, and texts containing “word2” or “word3” will be labeled as “word2 OR word3”. In other words, each semicolon-separated string acts as both query and corresponding label. Texts matching multiple queries will be assigned multiple labels.
Usage
The following example shows how the step can be used in a recipe.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Query to label. Query is a string of labels/categories and associated keywords (see examples below). Use ”;” to separate categories, “OR” to join words for a category, and ”-” to exclude words from a category. The category label(s) will be formed using the query, e.g. a text containing “AA” and “BB” will be tagged as [AA,BB].
Whether to make search accent sensitive.
Whether to make search case sensitive.
Whether to match whole words only. If enabled, only matches a word if it is surrounded by non-alphanumeric characters.
Whether to return only the first match. If True, only the first match will be assigned to each text. The result will be a simple categorical column. If False, all identified matches will be assigned to each text. The result will be a multivalued column containing lists of categories.
Was this page helpful?