filter_duplicate_nodes
Remove duplicate nodes in a network.
For each pair of nodes connected by a link that indicates a similarity greater than a specified threshold, keeps only one of the two nodes and rewires the deleted node’s incoming and outgoing links to point to the “surviving” node.
Usage
The following example shows how the step can be used in a recipe.
To de-duplicate pairs of nodes with a link weight (similarity) greater than 0.9
To de-duplicate pairs of nodes with a link weight (similarity) greater than 0.9
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Similarity threshold for candidate nodes to be eliminated.
Any node linked to another node with a weight (usually similarity) greater than this value
will be eliminated. Default (null
) corresponds to positive infinity (no de-duplication).