> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# test_regression

> Evaluate a pretrained regression model on custom test data. 

Usually employed after the `train_regression` step.
???+ info "Prediction Model"
To use this step successfully you need to make sure the dataset you're predicting on is
as similar as possible to the one the model was trained on. We check that the necessary data
types and columns are present, but you should pay attention to how you handled these in the
recipe the model was generated. Any changes might lead to a significant degradation in
model performance.

## Usage

The following examples show how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      Assuming we have reserved a test set containing data that wasn't used to train the model, we can simply pass it to this step for evaluation:

      ```stan theme={null}
      test_regression(ds_test, model, {"target": "label"}) -> (ds_test.pred, ds_test.error)
      ```
    </Tab>

    <Tab title="Example 2">
      If the test data is contained in a larger dataset (e.g. along training data), but can be identified using a column indicating the split, we can use the following setup:

      ```stan theme={null}
      test_regression(ds, model, {
        "target": "label",
        "refit": true,
        "split": {
          "column": "split_name",
          "train_split": "train"
          "test_split": "test"
        }
      }) -> (ds.pred, ds.error)
      ```
    </Tab>

    <Tab title="Example 3">
      Alternatively, we can create a randomized train/test split on the fly, re-fit the model on the train set, and evaluate on the test set. In this case an additional column will be added to the dataset, indicating the split each row belongs to:

      ```stan theme={null}
      test_regression(ds, model, {
        "target": "label",
        "refit": true,
        "split": {
          "test_size": 0.2
        }
      }) -> (ds.pred, ds.error, ds.split)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      test_regression(ds: dataset, model: model_regression[ds], {
          "param": value,
          ...
      }) -> (pred: column, error: column, *split: column
      )
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="ds" type="dataset" required>
    A dataset containing features and target columns.
  </ParamField>

  <ParamField path="model" type="file[model_regression[ds]]" required>
    Name of trained model to use for prediction.
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="pred" type="column" required>
    Column containing the model predictions.
  </ParamField>

  <ParamField path="error" type="column" required>
    Column containing the model prediction errors.
  </ParamField>

  <ParamField path="*split" type="column">
    Optional column identifying the train/test split, if dataset was randomly sampled and model re-fit.
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="target" type="string (ds.column)" required>
    Target variable.
    Name of the column that contains your target values (labels).
  </ParamField>

  <ParamField path="split" type="[object, object]">
    Train/test split configuration.
    Identify the splits using an existing column or create a randomized split. In either case,
    the model will be refit on the train split and evaluated on the test split.

    <Accordion title="Options">
      <Tabs>
        <Tab title="Create random split">
          <ParamField path="test_size" type="number">
            Size of test split.
            The fraction of data used for testing. The remaining data will be used to refit the model.

            Values must be in the following range:

            ```javascript theme={null}
            0.0 < test_size < inf
            ```
          </ParamField>

          <ParamField path="seed" type="[integer, null]">
            Random seed.
            Seed used to initialize the random number generator assigning rows to train/test splits.
            If none is provided, result will be non-deterministic.
          </ParamField>
        </Tab>

        <Tab title="Use existing split column">
          <ParamField path="column" type="string (ds.column)" required>
            Split column.
            Name of the column that contains the split identifiers/names.
          </ParamField>

          <ParamField path="test_split" type="string" required>
            Test split identifier.
            Value of the split column that identifies the test set. Rows with this value will be used
            to evaluate the model. If no `train_split` parameter is provided, the remaining rows will be
            used to refit the model before evaluation.
          </ParamField>

          <ParamField path="train_split" type="string">
            Train split identifier.
            Value of the split column that identifies the train set. Rows with this value will be used
            to refit the model before evaluation. If not provided, all rows not belonging to the test split
            will be used in the refit.
          </ParamField>
        </Tab>
      </Tabs>
    </Accordion>
  </ParamField>

  <ParamField path="refit" type="boolean" default="false">
    Whether to retrain the model.
    If set to `true`, the model will be refit on the train split before evaluation. If set to `false`,
    the model will be evaluated on the test split without refitting. If no `split` configuration is provided,
    this parameter is ignored.
  </ParamField>
</Accordion>
