Skip to main content
At the moment the only supported clustering algorithm is Louvain. Louvain tries to identify the communities in a network by optimizing the modularity of the whole network, that is a measure of the density of edges inside communities to edges outside communities. The result is a column of cluster IDs (integers), where the value -1 is reserved for nodes in very small clusters, which are grouped into a “noise” cluster.

Usage

The following example shows how the step can be used in a recipe.

Examples

  • Example 1
  • Signature
The following configuration allows for smallish clusters and considers fewish data points as noise:
cluster_network(ds.targets, ds.weights, {
  "resolution": 0.3,
  "noise": 5
}) -> (ds.cluster)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
targets
column[list[number]]
required
A column containing link targets. Source is implied in the index.
*weights
column[list[number]]
cluster
column
required
A column containing cluster tags.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

algorithm
string
default:"louvain"
Clustering algorithm to use.Values must be one of the following:
  • louvain
resolution
number
default:"0.5"
The higher this value the bigger the clusters.Values must be in the following range:
0 < resolution1
noise
integer
default:"1"
The larger the value, the more conservative the clustering. Cluster with this number of nodes or less will be considered noise.Values must be in the following range:
0noise < inf
query
string
The graphext advanced query syntax used to select rows.
I