> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# train_survival

> Train and store a survival model to be loaded at a later point for prediction. 

Trains a survival model using the Cox Proportional Hazard model.

The output will always be a new column with the trained model's predictions on the training data,
as well as a saved and named model file that can be used in other projects for prediction of new data.

## Usage

The following examples show how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      Train a Cox Proportional Hazard survival model

      ```stan theme={null}
      train_survival(ds, {"target": ["event_observed", "duration"]}) -> (ds.predicted_survival, "my-survival-model")
      ```
    </Tab>

    <Tab title="Example 2">
      Train a CoxPH model with penalization and median survival time prediction

      ```stan theme={null}
      train_survival(ds, {"target": ["event_observed", "time_to_event"], "predictions": {"kind": "median"}, "params": {"penalizer": 0.1}}) -> (ds.predicted_survival, "my-survival-model")
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      train_survival(ds: dataset, {
          "param": value,
          ...
      }) -> (predicted: number, model: model_survival[ds])
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="ds" type="dataset" required>
    Should contain the target columns (see `target` parameter below) and the feature columns you wish to use in the model.
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="predicted" type="column[number]" required>
    Name for output column containing model predictions.
  </ParamField>

  <ParamField path="model" type="file[model_survival[ds]]" required>
    Zip file containing the trained model and associated information.
  </ParamField>

  <ParamField path="info" type="file.hidden" required />
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <Tabs>
    <Tab title="CoxPH">
      <ParamField path="model" type="string" default="CoxPH">
        Kind of survival model to train.
        "CoxPH" trains a [lifelines Cox Proportional Hazard model](https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#cox-s-proportional-hazard-model).
      </ParamField>

      <ParamField path="target" type="array">
        Target variables.
        Two names, exactly, corresponding to the target columns that contain in the following order:

        1. whether the event was observed (boolean) and
        2. the time (duration) to event or censoring (number).

        <Accordion title="Array items">
          <ParamField path="Item 0" type="string (ds.column:boolean)" />

          <ParamField path="Item 1" type="string (ds.column:number)" />
        </Accordion>
      </ParamField>

      <ParamField path="predictions" type="object">
        Configure the kind of predictions to return.

        <Accordion title="Properties">
          <ParamField path="kind" type="string" default="median">
            Kind of prediction.
            `median` returns the median survival time. `percentile` returns the survival time at
            the given percentile. `expectation` returns the expected survival time.
            `survival_function` returns the whole survival function (one series per sample).

            Values must be one of the following:

            * `median`
            * `percentile`
            * `expectation`
            * `survival_function`
          </ParamField>

          <ParamField path="percentile" type="number" default="0.5">
            Percentile when `kind` is set to `percentile`

            Values must be in the following range:

            ```javascript theme={null}
            0 ≤ percentile ≤ 1
            ```
          </ParamField>

          <ParamField path="times" type="[array, object]">
            Points in time to predict.
            Configures at which points to predict when `kind` is set to `survival_function`.
            Either an explicit array of durations, or an object specifying a duration step size and
            maximum duration.

            <Accordion title="Options">
              <Tabs>
                <Tab title="Explicit times">
                  <ParamField path="{_}" type="array[number]">
                    Array of times/durations.
                    Will predict the survival function at each of the durations. E.g. `[1, 2, 3, 4, 5]`.

                    <Accordion title="Array items">
                      <ParamField path="Item" type="number">
                        Each item in array.
                      </ParamField>
                    </Accordion>
                  </ParamField>
                </Tab>

                <Tab title="Step enumeration">
                  <ParamField path="step" type="number" default="1">
                    Step size.
                    The step size in the enumeration of durations. E.g. `1` will predict the survival
                    function at each integer duration.

                    Values must be in the following range:

                    ```javascript theme={null}
                    0 < step < inf
                    ```
                  </ParamField>

                  <ParamField path="max" type="[number, null]">
                    Maximum duration.
                    If not provided, or `null`, the maximum duration in the dataset is used.
                  </ParamField>
                </Tab>
              </Tabs>
            </Accordion>
          </ParamField>
        </Accordion>
      </ParamField>

      <ParamField path="params" type="object">
        Model parameters.

        <Accordion title="Properties">
          <ParamField path="alpha" type="number" default="0.05">
            Level in the confidence intervals.
          </ParamField>

          <ParamField path="penalizer" type="number" default="0.0">
            Penalizer strength.
            Attach an L2 penalizer to the size of the coefficients during regression.
            This improves stability of the estimates and controls for high correlation between covariates.

            Values must be in the following range:

            ```javascript theme={null}
            0.0 ≤ penalizer < inf
            ```
          </ParamField>

          <ParamField path="l1_ratio" type="number" default="0.0">
            L1 vs L2 penalty ratio.
            Specify what ratio to assign to a L1 vs L2 penalty (ridge vs lasso). Same as scikit-learn
            convention.

            Values must be in the following range:

            ```javascript theme={null}
            0.0 ≤ l1_ratio ≤ 1.0
            ```
          </ParamField>

          <ParamField path="strata" type="array[string]">
            Columns to use in stratification.
            This is useful if a categorical covariate does not obey the proportional hazard assumption.

            <Accordion title="Array items">
              <ParamField path="Item" type="string (ds.column:categorical)">
                Each item in array.
              </ParamField>
            </Accordion>
          </ParamField>

          <ParamField path="baseline_estimation_method" type="string" default="breslow">
            How the fitter should estimate the baseline.

            Values must be one of the following:

            * `breslow`
            * `spline`
            * `piecewise`
          </ParamField>
        </Accordion>
      </ParamField>
    </Tab>
  </Tabs>
</Accordion>
