filter_duplicates

Usage

The following example shows how the step can be used in a recipe.

Examples

To keep only the first row amongst a set of duplicates, identifying duplicates by inspecting values in columns “address” and “name”

filter_duplicates(ds, {"columns": ["address", "name"], "keep": "first"}) -> (ds_filtered)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration