For each pair of nodes connected by a link that indicates a similarity greater than a specified threshold, keeps only one of the two nodes and rewires the deleted node’s incoming and outgoing links to point to the “surviving” node.Documentation Index
Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
Use this file to discover all available pages before exploring further.
Usage
The following example shows how the step can be used in a recipe.Examples
Examples
- Example 1
- Signature
To de-duplicate pairs of nodes with a link weight (similarity) greater than 0.9
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced
by name e.g. "churn-clf").
Inputs
Inputs
dataset containing the nodes (rows) to de-duplicate and the links between nodes of the input dataset.
Outputs
Outputs
A new dataset containing the same columns as the input
data, but without duplicate rows and having connections rewired such that none
points to a deleted node.Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output).
Parameters
Parameters
Similarity threshold for candidate nodes to be eliminated.
Any node linked to another node with a weight (usually similarity) greater than this value
will be eliminated. Default (
null) corresponds to positive infinity (no de-duplication).