Dimensionality reduction with “Uniform Manifold Approximation and Projection” (UMAP)

Generates numeric embeddings (vectors) of the input data with reduced dimensionality, preserving local and global similarities between data points. Can be used for visualisation, for example, to arrange data in 2 dimensions according to their similarity, or to create nearest neighbour graphs/networks (also see step link_embeddings in the latter case).

Can be used in supervised mode (providing a target column as parameter) or unsupervised (without target).

The output will always be a new column with the trained model’s predictions on the training data, as well as a saved and named model file that can be used in other projects for prediction of new data.

target
string

Target variable. Name of the column that contains your target values.

encode_features
boolean
default: "true"

Toggle encoding of feature columns. When enabled, Graphext will auto-convert any column types to the numeric type before fitting the model. How this conversion is done can be configured using the feature_encoder option below.

If disabled, any model trained in this step will assume that input data is already in an appropriate format (e.g. numerical and not containing any missing values).
feature_encoder
[null, object]

Configures encoding of feature columns. By default (null), Graphext chooses automatically how to convert any column types the model may not understand natively to a numeric type.

A configuration object can be passed instead to overwrite specific parameter values with respect to their default values.

include_text_features
boolean

Whether to include or ignore text columns during the processing of input data. Enabling this will convert texts to their TfIdf representation. Each text will be converted to an N-dimensional vector in which each component measures the relative “over-representation” of a specific word (or n-gram) relative to its overall frequency in the whole dataset. This is disabled by default because it will often be better to convert texts explicitly using a previous step, such as embed_text or embed_text_with_model.

params
object

Model parameters. See official UMAP documentation for details.

seed
integer

Seed for random number generator ensuring reproducibility.

Values must be in the following range:

0 ≤ seed < inf