Usage
The following example shows how the step can be used in a recipe.Examples
Examples
To de-duplicate pairs of nodes with a link weight (similarity) greater than 0.9
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
dataset containing the nodes (rows) to de-duplicate and the links between nodes of the input dataset.
Outputs
Outputs
A new dataset containing the same columns as the input
data
, but without duplicate rows and having connections rewired such that none
points to a deleted node.Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Similarity threshold for candidate nodes to be eliminated.
Any node linked to another node with a weight (usually similarity) greater than this value
will be eliminated. Default (
null
) corresponds to positive infinity (no de-duplication).