label_texts_containing

Assigns each text to one or more categories. Each category is defined by a list of keywords a text must include or exclude to be labelled accordingly. In addition, each category may specify whether a keyword must be matched explicitly, ignoring its case (lower, upper) etc. See parameters below for further details.

Usage

The following example shows how the step can be used in a recipe.

Examples

The following defines the keywords to be included or exluded for each of three categories, labelled “journalist”, “business” and “CEO”. Note how in the case of “CEO” we’re looking for occurrences of the spelling with capitals only.

label_texts_containing(ds.text, {
  "journalists": {
    "include": ["journalist", "journalism", "news"],
    "exclude": ["blogger"],
    "case_sensitive": false
  },
  "business": {
    "include":["startup", "entrepreneur", "founder"]
  },
  "CEOs": {
    "include": ["CEO"],
    "case_sensitive": true
  }
}) -> (ds.field_of_occupation)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration