extract_regex

A regular expression (or regex, regex pattern) is a sequence of characters that forms a search pattern. This pattern is compared against texts, and any matches returned. The matches don’t have to be returned as found, but can be formatted using the output parameter. Check below references to familiarize yourself with the regex language:

Also see the pattern parameter below for more details.

Usage

The following example shows how the step can be used in a recipe.

Examples

Extract all twitter mentions with handles between 1 and 15 characters long into lists of mentions

extract_regex(ds.text, {
  "pattern": "@\\w{1,15}",
  "extract_all": true
}) -> (ds.mentions)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration