Skip to content

Filter duplicate nodes

networkgraph

Remove duplicate nodes in a network.

For each pair of nodes connected by a link that indicates a similarity greater than a specified threshold, keeps only one of the two nodes and rewires the deleted node's incoming and outgoing links to point to the "surviving" node.

Usage


The following are the step's expected inputs and outputs and their specific types.

Step signature
filter_duplicate_nodes(network: dataset, {
    "param": value
}) -> (network_flt: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Example

To de-duplicate pairs of nodes with a link weight (similarity) greater than 0.9

Example call (in recipe editor)
filter_duplicate_nodes(network, {
  "duplicate_threshold": 0.9
}) -> (network_filtered)

Inputs


network: dataset

dataset containing the nodes (rows) to de-duplicate and the links between nodes of the input dataset.

Outputs


network_flt: dataset

A new dataset containing the same columns as the input data, but without duplicate rows and having connections rewired such that none points to a deleted node.

Parameters


duplicate_threshold: number | null

Similarity threshold for candidate nodes to be eliminated. Any node linked to another node with a weight (usually similarity) greater than this value will be eliminated. Default (null) corresponds to positive infinity (no de-duplication).