Skip to main content
Usually employed after the train_classification step. ???+ info “Prediction Model” To use this step successfully you need to make sure the dataset you’re predicting on is as similar as possible to the one the model was trained on. We check that the necessary data types and columns are present, but you should pay attention to how you handled these in the recipe the model was generated. Any changes might lead to a significant degradation in model performance.

Usage

The following examples show how the step can be used in a recipe.

Examples

  • Example 1
  • Example 2
  • Example 3
  • Signature
Assuming we have reserved a test set containing data that wasn’t used to train the model, we can simply pass it to this step for evaluation:
test_classification(ds_test, model, {"target": "label"}) -> (ds_test.pred, ds_test.prob, ds_test.error)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
ds
dataset
required
A dataset containing features and target columns.
model
file[model_classification[ds]]
required
Name of trained model to use for prediction.
pred
column
required
Column containing the model predictions.
prob
column
required
Column containing the model prediction probabilities.
error
column
required
Column containing the model prediction errors.
*split
column
Optional column identifying the train/test split, if dataset was randomly sampled and model re-fit.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

target
string (ds.column:category|boolean)
required
Target variable. Name of the column that contains your target values (labels).
positive_class
[string, null]
Name of the positive class. In binary classification, usually the class you’re most interested in, for example the label/class corresponding to successful lead conversion in a lead score model, the class corresponding to a customer who has churned in a churn prediction model, etc.If provided, will return predicted probabilities for the positive class. If not provided, will return probabilities for the predicted class (i.e. the class with the highest probability).
split
[object, object]
Train/test split configuration. Identify the splits using an existing column or create a randomized split. In either case, the model will be refit on the train split and evaluated on the test split.
  • Create random split
  • Use existing split column
test_size
number
Size of test split. The fraction of data used for testing. The remaining data will be used to refit the model.Values must be in the following range:
0.0 < test_size < inf
seed
[integer, null]
Random seed. Seed used to initialize the random number generator assigning rows to train/test splits. If none is provided, result will be non-deterministic.
refit
boolean
default:"false"
Whether to retrain the model. If set to true, the model will be refit on the train split before evaluation. If set to false, the model will be evaluated on the test split without refitting. If no split configuration is provided, this parameter is ignored.
I