Skip to content

Filter duplicate nodes


Remove duplicate nodes in a network.

For each pair of nodes connected by a link that indicates a similarity greater than a specified threshold, keeps only one of the two nodes and rewires the deleted node's incoming and outgoing links to point to the "surviving" node.


The following are the step's expected inputs and outputs and their specific types.

Step signature
filter_duplicate_nodes(network: dataset, {"param": value}) -> (network_flt: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.


To de-duplicate pairs of nodes with a link weight (similarity) greater than 0.9

Example call (in recipe editor)
filter_duplicate_nodes(network, {
  "duplicate_threshold": 0.9
}) -> (network_filtered)


network: dataset

dataset containing the nodes (rows) to de-duplicate and the links between nodes of the input dataset.


network_flt: dataset

A new dataset containing the same columns as the input data, but without duplicate rows and having connections rewired such that none points to a deleted node.


duplicate_threshold: number | null

Similarity threshold for candidate nodes to be eliminated. Any node linked to another node with a weight (usually similarity) greater than this value will be eliminated. Default (null) corresponds to positive infinity (no de-duplication).