concatenate
Concatenate columns as text or lists with optional separator as well as pre- and postfix.
If only a single input column is provided, even if it is a list, the result will be a text column by default.
If multiple columns are passed, and any of these contains lists, then the result is also a column of lists. In this case, each output list will contain the result of concatenating all elements in the corresponding row, whether these elements are themselves lists or not.
If none of the multiple input columns contains lists, the result will be a text column. Each input column will be converted to a string representation if necessary, and then concatenated with a given separator and pre- and/or postfix.
You can change this default behavior by explicitly setting an out_type
in params.
Usage
The following examples show how the step can be used in a recipe.
The following example combines first names, last names and a title to create a new column with values in the form “Dr. first_name last_name”:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
A separator to use between elements of individual columns when concatenating as texts.
A prefix to prepend to the result of the concatenation (or to a single column if no more were provided).
A postfix to append to the result of the concatenation (or to a single column if no more were provided).
How to represent missing values (NaN) in the concatenated result.
If a “nan_as” value is specified, this will be used to fill in missing values during concatenation.
With "nan_as": null
the concatenation will produce a missing value in rows where at least 1 column
to be concatenated had a missing value.
The semantic data type of the output column. Note, if this type is not compatible with the result of the concatenation, the output may consist of missing values (NaNs) only.
Values must be one of the following:
category
date
number
currency
url
boolean
text
list[category]
list[date]
list[number]
list[currency]
list[url]
list[boolean]
Was this page helpful?