Usage
The following example shows how the step can be used in a recipe.Examples
Examples
The following configuration applies the algorithm with the default values:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
Outputs
Outputs
Column containing merged categories.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Determines which categories will be merged.
After hierarchically clustering all categories, clusters of categories closer than this distance
will be merged into one.Also see details in scikit-learn’s Agglomerative Clustering.Values must be in the following range:
Which linkage criterion to use in the clustering.
While the distance metric applied is always the cosine between category embeddings, this parameter
determines how to calculate the distance between clusters of embeddings, e.g. selecting the maximum
distance between categories in two clusters (“complete”), the minimum (“single”) etc.Also see details in scikit-learn’s Agglomerative Clustering.Values must be one of the following:
single
ward
complete
average