join

I.e., the equivalent of a database join of two tables. Adds the columns from the second dataset (ds_right) to the first (ds_left). If the two datasets contain columns with identical names (other than those used to perform the join), configurable suffixes will be appended to their names in the resulting dataset (see suffixes parameter below). The rows included in the result depend on the kind of join (see the how parameter below). Depending on whether it’s a left, right, inner, or outer-join, may include rows from either dataset or both. The join performed is always an equi-join, meaning that rows from the left are matched with rows from the right where their respective values in the join column (or indexes) are identical (e.g. where the value of column id on the left is equal to the value of column id on the right). Also see Wikipedia’s article on table joins to learn more about them.

Usage

The following example shows how the step can be used in a recipe.

Examples

For example, to enrich a dataset containing employees with information about their department:

join(employees, departments, {
  "how": "left",
  "left": "department_id",
  "right": "id"
}) -> (employees_dep)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration