Skip to main content
For each pair of nodes connected by a link that indicates a similarity greater than a specified threshold, keeps only one of the two nodes and rewires the deleted node’s incoming and outgoing links to point to the “surviving” node.

Usage

The following example shows how the step can be used in a recipe.

Examples

  • Example 1
  • Signature
To de-duplicate pairs of nodes with a link weight (similarity) greater than 0.9
filter_duplicate_nodes(network, {
  "duplicate_threshold": 0.9
}) -> (network_filtered)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
network
dataset
required
dataset containing the nodes (rows) to de-duplicate and the links between nodes of the input dataset.
network_flt
dataset
required
A new dataset containing the same columns as the input data, but without duplicate rows and having connections rewired such that none points to a deleted node.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

duplicate_threshold
[number, null]
required
Similarity threshold for candidate nodes to be eliminated. Any node linked to another node with a weight (usually similarity) greater than this value will be eliminated. Default (null) corresponds to positive infinity (no de-duplication).
I