replace_regex

A regular expression (or regex, regex pattern) is a sequence of characters that forms a search pattern. This pattern is compared against texts, and any matches are substituted by a desired replacement. The replacement can be a simple (constant) text string, or a formatting pattern referencing all or parts of the matched character sequence. Simple replacement of fixed text strings with another fixed text string can be performed easily. E.g., to replace all occurrences of “hi” with “hello”, you’d simply use {"pattern": "hi", "replacement": "hello"}. However, using capturing groups in pattern and replacement parameters allows for much greater flexibility. For example, if a column of texts includes twitter mentions of the form “@abc”, the regular expression "pattern": "@(\\w*)" will match these mentions and save the actual name without the ”@” character in a capturing group. Using the replacement string "replacement": "{1}" will then replace all matched mentions with only the name part of the twitter handle, effectively removing the ”@” tags from all mentions (without removing other occurrences of the ”@” character). To further familiarize yourself with the regex language also see these references:

Usage

The following example shows how the step can be used in a recipe.

Examples

To change the way dates are formatted in a column of texts from “2019-04-15” to “15.04.2019”: The specified pattern will match 3 numbers separated by the minus sign, and will replace such occurences by the same three numbers in reverse order and separated with a period.

replace_regex(ds.text, {
    "pattern": "(\d+)-(\d+)-(\d+)",
    "replacement": "{3}.{2}.{1}"
}) -> (ds.replaced)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration