I.e., the equivalent of a database join of two tables. Adds the columns from the second dataset (ds_right) to the first (ds_left). If the two datasets contain columns with identical names (other than those used to perform the join), configurable suffixes will be appended to their names in the resulting dataset (see suffixes parameter below).

The rows included in the result depend on the kind of join (see the how parameter below). Depending on whether it’s a left, right, inner, or outer-join, may include rows from either dataset or both.

The join performed is always an equi-join, meaning that rows from the left are matched with rows from the right where their respective values in the join column (or indexes) are identical (e.g. where the value of column id on the left is equal to the value of column id on the right).

Also see Wikipedia’s article on table joins to learn more about them.

Usage

The following example shows how the step can be used in a recipe.

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).