A regular expression.
The pattern to be matched in input texts. May include (numbered) regex capturing groups,
which allows this method to use parts of a match to format the way matches are represented in the output via the
output
parameter. The latter uses google-re2 string replacement with curly braces and numerical identifiers,
e.g. "" instead of the usual regex syntax using backslashes, like “\1”. Numerical identifiers refer to capturing
groups in the regex pattern (named groups are not supported), where
- 0 is the whole match
- 1 is the 1st capturing group
- 2 is the 2nd capturing group
- etc…
The default is "{0}"
, i.e. simply returning the full match.
For example, if a column of texts includes twitter mentions of the form “@abc”, the regular expression
"pattern": "(@)(\\w*)"
will match these mentions and save the ”@” character and the actual name in two separate capturing groups.
Using the output format
"output": "Match: {0}, Tag: {1}, Name: {2}"
will then return matches in the form “Match: @abc, Tag: @, Name: abc”.