join
parameter controls whether only the
common columns are kept (inner
), or all columns (outer
). In the latter case, rows will have missing
values (NaNs), where a column only existed in one of the two datasets.
Usage
The following example shows how the step can be used in a recipe.Examples
Examples
To append the rows of dataset
ds_right
to the dataset ds_left
, keeping all columns from both datasets:Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
Outputs
Outputs
A dataset containing the rows of both
ds_left
, and ds_right
,
as well as an aditional column original_index
indicating the index of each row in its original dataset.Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Whether to do concatenate using an “inner” or “outer” join of columns.
When
"inner"
, only common columns will be kept. When "outer"
, all columns will be kept.Values must be one of the following:inner
outer