Dimensionality reduction with “Uniform Manifold Approximation and Projection” (UMAP)

Generates numeric embeddings (vectors) of the input data with reduced dimensionality, preserving local and global similarities between data points. Can be used for visualisation, for example, to arrange data in 2 dimensions according to their similarity, or to create nearest neighbour graphs/networks (also see step link_embeddings in the latter case).

Can be used in supervised mode (providing a target column as parameter) or unsupervised (without target).

The output will always be a new column with the trained model’s predictions on the training data, as well as a saved and named model file that can be used in other projects for prediction of new data.

Usage

The following examples show how the step can be used in a recipe.

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).