Skip to content

Cluster network

network · louvain · community detection

Identify clusters in the network.

At the moment the only supported clustering algorithm is Louvain. Louvain tries to identify the communities in a network by optimizing the modularity of the whole network, that is a measure of the density of edges inside communities to edges outside communities. The result is a column of cluster IDs (integers), where the value -1 is reserved for nodes in very small clusters, which are grouped into a "noise" cluster.


The following configuration allows for smallish clusters and considers fewish data points as noise:

cluster_network(links, {
  "resolution": 0.3,
  "noise": 5
}) -> (ds.cluster)


The following are the step's expected inputs and outputs and their specific types.

cluster_network(links: dataset, {"param": value}) -> (cluster: category)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.


links: dataset

A dataset of links (having source, target and weight columns). Usually generated using a prior link_[x] step.


cluster: column:category

Column containing cluster tags.


algorithm: string = "louvain"

Clustering algorithm to use.

Must be one of: "louvain"

resolution: number = 0.5

The higher this value the bigger the clusters.

Range: 0 < resolution ≤ 1

noise: integer = 1

The larger the value, the more conservative the clustering. Essentially, the minimum number of nodes inside a cluster to not be considered noise.

Range: 0 ≤ noise < inf