filter_duplicate_nodes
Remove duplicate nodes in a network.
For each pair of nodes connected by a link that indicates a similarity greater than a specified threshold, keeps only one of the two nodes and rewires the deleted node’s incoming and outgoing links to point to the “surviving” node.
Usage
The following example shows how the step can be used in a recipe.
To de-duplicate pairs of nodes with a link weight (similarity) greater than 0.9
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Similarity threshold for candidate nodes to be eliminated.
Any node linked to another node with a weight (usually similarity) greater than this value
will be eliminated. Default (null
) corresponds to positive infinity (no de-duplication).
Was this page helpful?