cluster_network
Identify clusters in the network.
At the moment the only supported clustering algorithm is Louvain. Louvain tries to identify the communities in a network by optimizing the modularity of the whole network, that is a measure of the density of edges inside communities to edges outside communities. The result is a column of cluster IDs (integers), where the value -1 is reserved for nodes in very small clusters, which are grouped into a “noise” cluster.
Usage
The following example shows how the step can be used in a recipe.
The following configuration allows for smallish clusters and considers fewish data points as noise:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Clustering algorithm to use.
Values must be one of the following:
louvain
The higher this value the bigger the clusters.
Values must be in the following range:
The larger the value, the more conservative the clustering. Cluster with this number of nodes or less will be considered noise.
Values must be in the following range:
The graphext advanced query syntax used to select rows.
Was this page helpful?