split_string
Split a single column containing texts into two.
The values of a text column will be split in two at the first occurrence of a given pattern, returning two new text columns. For example, splitting a text column on the comma character (”,”) will produce two new columns: the first containing everything before the first comma encountered in each text, and the second containing all text encountered after the comma.
If the specified split pattern was not encountered in any of the input texts, the first output column will contain the original text, and the second column will contain missing values only (NaN).
Usage
The following example shows how the step can be used in a recipe.
E.g. to split on the first comma encountered starting from the left of each text:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Was this page helpful?