Extract parts of texts detected using regular expressions.
output
parameter. Check below references to familiarize yourself with the regex
language:
Also see the pattern
parameter below for more details.
Examples
ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Outputs
"concat_matches": true
or "extract_all": false
(matches are strings), and "as_category": false
"concat_matches": true
or "extract_all": false
(matches are strings), and "as_category": true
"extract_all": true
and "concat_matches": false
step(..., {"param": "value", ...}) -> (output)
.
Parameters
output
parameter. The latter uses google-re2 string replacement with curly braces and numerical identifiers,
e.g. "" instead of the usual regex syntax using backslashes, like “\1”. Numerical identifiers refer to capturing
groups in the regex pattern (named groups are not supported), where"{0}"
, i.e. simply returning the full match.For example, if a column of texts includes twitter mentions of the form “@abc”, the regular expression"pattern": "(@)(\\w*)"
will match these mentions and save the ”@” character and the actual name in two separate capturing groups.
Using the output format"output": "Match: {0}, Tag: {1}, Name: {2}"
will then return matches in the form “Match: @abc, Tag: @, Name: abc”.Examples
Array items
ascii
ignorecase
locale
multiline
dotall
"extract_all": false"
or "concat_matches": false
),
whether to return a column of type category rather than text.