embed_with_trees

Usually employed after the train_classifcation or train_regression steps with RandomForest/ExtraTrees/Catboost models. ???+ info “Prediction Model” To use this step successfully you need to make sure the dataset you’re predicting on is as similar as possible to the one the model was trained on. We check that the necessary data types and columns are present, but you should pay attention to how you handled these in the recipe the model was generated. Any changes might lead to a significant degradation in model performance.

This process needs a pre-trained RandomForest, ExtraTrees or Catboost model trained through the train_classification or train_regression methods on the same dataset that is used as input. By calling this method, each data point in the dataset is passed through the trees in the forest, and the leaf node where each data point ends up in each tree is recorded. The indices of these leaf nodes across all trees in the forest are then used to form a sparse high-dimensional representation of each data point. This representation can be thought of as an embedding, where the position of each data point in this high-dimensional space captures aspects of its similarity to other data points, as determined by the structure of the trees in the model. For more detailed information on the method followed, you can check the apply method of sklearns’ forest classes and its usage here.

Usage

The following example shows how the step can be used in a recipe.

Examples

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration