Skip to main content
The values of a text column will be split in two at the first occurrence of a given pattern, returning two new text columns. For example, splitting a text column on the comma character (”,”) will produce two new columns: the first containing everything before the first comma encountered in each text, and the second containing all text encountered after the comma. If the specified split pattern was not encountered in any of the input texts, the first output column will contain the original text, and the second column will contain missing values only (NaN).

Usage

The following example shows how the step can be used in a recipe.

Examples

  • Example 1
  • Signature
E.g. to split on the first comma encountered starting from the left of each text:
split_string(ds.text, {"pattern": ","}) -> (ds.text_left, ds.text_right)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
input
column[text|category]
required
A text column to split.
output_left
column[text]
required
A text column containing the part to the left of the given split pattern.
output_right
column[text]
required
A text column containing the part to the right of the given split pattern.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

pattern
string
default:" "
A pattern of characters indicating where to split each text. By default uses the whitespace ” ”.
right
boolean
default:"false"
Whether to search for the pattern starting from the right instead of starting from the left (default).
I