Filter duplicate nodes¶
network • graph
Remove duplicate nodes in a network.
For each pair of nodes connected by a link that indicates a similarity greater than a specified threshold, keeps only one of the two nodes and rewires the deleted node's incoming and outgoing links to point to the "surviving" node.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
filter_duplicate_nodes(network: dataset, {"param": value}) -> (network_flt: dataset)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
To de-duplicate pairs of nodes with a link weight (similarity) greater than 0.9
filter_duplicate_nodes(network, {
"duplicate_threshold": 0.9
}) -> (network_filtered)
Inputs¶
network: dataset
dataset containing the nodes (rows) to de-duplicate and the links between nodes of the input dataset.
Outputs¶
network_flt: dataset
A new dataset containing the same columns as the input data
, but without duplicate rows and having connections rewired such that none
points to a deleted node.
Parameters¶
duplicate_threshold: number | null
Similarity threshold for candidate nodes to be eliminated. Any node linked to another node with a weight (usually similarity) greater than this value
will be eliminated. Default (null
) corresponds to positive infinity (no de-duplication).