Test regression¶
inference • models • classification • evaluation
Evaluate a pretrained regression model on custom test data.
Usually employed after the train_regression
step.
Prediction Model
To use this step successfully you need to make sure the dataset you're predicting on is as similar as possible to the one the model was trained on. We check that the necessary data types and columns are present, but you should pay attention to how you handled these in the recipe the model was generated. Any changes might lead to a significant degradation in model performance.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
test_regression(
ds: dataset,
model: model_classification[ds],
{
"param": value
}
) -> (
pred: category,
error: bool,
*split: category
)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
Assuming we have reserved a test set containing data that wasn't used to train the model, we can simply pass it to this step for evaluation:
test_regression(ds_test, model, {"target": "label"}) -> (ds_test.pred, ds_test.error)
More examples
If the test data is contained in a larger dataset (e.g. along training data), but can be identified using a column indicating the split, we can use the following setup:
test_regression(ds, model, {
"target": "label",
"refit": true,
"split": {
"column": "split_name",
"train_split": "train"
"test_split": "test"
}
}) -> (ds.pred, ds.error)
Alternatively, we can create a randomized train/test split on the fly, re-fit the model on the train set, and evaluate on the test set. In this case an additional column will be added to the dataset, indicating the split each row belongs to:
test_regression(ds, model, {
"target": "label",
"refit": true,
"split": {
"test_size": 0.2
}
}) -> (ds.pred, ds.error, ds.split)
Inputs¶
ds: dataset
A dataset containing features and target columns.
model: file:model_classification[ds]
Name of trained model to use for prediction.
Outputs¶
pred: column:category
Column containing the model predictions.
error: column:bool
Column containing the model prediction errors.
*split: column:category
Optional column identifying the train/test split, if dataset was randomly sampled and model re-fit.
Parameters¶
target: string
Target variable. Name of the column that contains your target values (labels).
split: object | object
Train/test split configuration. Identify the splits using an existing column or create a randomized split. In either case, the model will be refit on the train split and evaluated on the test split.
Items in split
test_size: number
Size of test split. The fraction of data used for testing. The remaining data will be used to refit the model.
Range: 0.0 < test_size < inf
seed: integer | null
Random seed. Seed used to initialize the random number generator assigning rows to train/test splits. If none is provided, result will be non-deterministic.
column: string
Split column. Name of the column that contains the split identifiers/names.
test_split: string
Test split identifier. Value of the split column that identifies the test set. Rows with this value will be used
to evaluate the model. If no train_split
parameter is provided, the remaining rows will be
used to refit the model before evaluation.
train_split: string
Train split identifier. Value of the split column that identifies the train set. Rows with this value will be used to refit the model before evaluation. If not provided, all rows not belonging to the test split will be used in the refit.
refit: boolean = False
Whether to retrain the model. If set to true
, the model will be refit on the train split before evaluation. If set to false
,
the model will be evaluated on the test split without refitting. If no split
configuration is provided,
this parameter is ignored.