cluster_network

At the moment the only supported clustering algorithm is Louvain. Louvain tries to identify the communities in a network by optimizing the modularity of the whole network, that is a measure of the density of edges inside communities to edges outside communities. The result is a column of cluster IDs (integers), where the value -1 is reserved for nodes in very small clusters, which are grouped into a “noise” cluster.

Usage

The following example shows how the step can be used in a recipe.

Examples

The following configuration allows for smallish clusters and considers fewish data points as noise:

cluster_network(ds.targets, ds.weights, {
  "resolution": 0.3,
  "noise": 5
}) -> (ds.cluster)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

algorithm

string

default:"louvain"

Clustering algorithm to use.Values must be one of the following:

louvain

resolution

number

default:"0.5"

The higher this value the bigger the clusters.Values must be in the following range:

0 < resolution ≤ 1

noise

integer

default:"1"

The larger the value, the more conservative the clustering. Cluster with this number of nodes or less will be considered noise.Values must be in the following range:

0 ≤ noise < inf

query

string

The graphext advanced query syntax used to select rows.

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration