Train and store a machine learning model to be loaded at a later point for prediction.
min_cluster_size
.
Can be used to predict the clusters of new data without changing the existing clustering.
Examples
ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Outputs
step(..., {"param": "value", ...}) -> (output)
.
Parameters
feature_encoder
option below.null
), Graphext chooses automatically how to convert any column types the model
may not understand natively to a numeric type.A configuration object can be passed instead to overwrite specific parameter values with respect
to their default values.Properties
Properties
Mean
Median
MostFrequent
Const
None
Standard
Robust
KNN
None
scaler
function.
Details depend no the particular scaler used.Options
MostFrequent
Const
None
OneHot
Label
Ordinal
Binary
Frequency
None
Standard
Robust
KNN
None
list[category]
for short). May contain either a single configuration for
all multilabel variables, or two different configurations for low- and high-cardinality variables.
For further details pick one of the two options below.Options
Binarizer
TfIdf
None
Euclidean
KNN
Norm
None
Properties
Array items
day
dayofweek
dayofyear
hour
minute
month
quarter
season
second
week
weekday
weekofyear
year
Array items
day
dayofweek
dayofyear
hour
month
Mean
Median
MostFrequent
Const
None
Standard
Robust
KNN
None
Euclidean
KNN
Norm
None
list[number]
for short).include_text_features
below to active it.Properties
Euclidean
KNN
Norm
None
embed_text
or embed_text_with_model
.Properties
eom
), can sometimes pick one or two large clusters and then a number
of small extra clusters. If you’re interested in a more fine-grained clustering with a larger number of more homogeously
sized clusters, you may prefer selecting leaf clustering (leaf
).Values must be one of the following:eof
leaf