Skip to main content
At the moment the only supported clustering algorithm is Louvain. Louvain tries to identify the communities in a network by optimizing the modularity of the whole network, that is a measure of the density of edges inside communities to edges outside communities. The result is a column of cluster IDs (integers), where the value -1 is reserved for nodes in very small clusters, which are grouped into a “noise” cluster.

Usage

The following example shows how the step can be used in a recipe.

Examples

  • Example 1
  • Signature
The following configuration allows for smallish clusters and considers fewish data points as noise:
cluster_subnetwork(ds, {
  "targets": "targets",
  "weights": "weights",
  "resolution": 0.3,
  "noise": 5
}) -> (ds.cluster)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
ds_in
dataset
required
An input dataset to use as source of the network.
cluster
column
required

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

targets
string (ds_in.column:list[number])
required
Name of column containing the link targets. Source is implied in the index.
weights
string (ds_in.column:list[number])
Name of column containing the link weights.
query
string
required
The graphext advanced query syntax used to select rows.
algorithm
string
default:"louvain"
Clustering algorithm to use.Values must be one of the following:
  • louvain
resolution
number
default:"0.5"
The higher this value the bigger the clusters.Values must be in the following range:
0 < resolution1
noise
integer
default:"1"
The larger the value, the more conservative the clustering. Cluster with this number of nodes or less will be considered noise.Values must be in the following range:
0noise < inf
I