train_survival

Trains a survival model using the Cox Proportional Hazard model.

The output will always be a new column with the trained model’s predictions on the training data, as well as a saved and named model file that can be used in other projects for prediction of new data.

Usage

The following shows how the step can be used in a recipe.

Examples

General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

train_survival(ds: dataset, {
    "param": value,
    ...
}) -> (predicted: number, model: model_survival[ds])

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

model

string

default:"CoxPH"

Kind of survival model to train. “CoxPH” trains a lifelines Cox Proportional Hazard model.

target

array

Target variables. Two names, exactly, corresponding to the target columns that contain in the following order:

whether the event was observed (boolean) and
the time (duration) to event or censoring (number).

Array items

predictions

object

Configure the kind of predictions to return.

Properties

kind

string

default:"median"

Kind of prediction. median returns the median survival time. percentile returns the survival time at the given percentile. expectation returns the expected survival time. survival_function returns the whole survival function (one series per sample).

Values must be one of the following:

median
percentile
expectation
survival_function

percentile

number

default:"0.5"

Percentile when kind is set to percentile

Values must be in the following range:

0 ≤ percentile ≤ 1

times

[array, object]

Points in time to predict. Configures at which points to predict when kind is set to survival_function. Either an explicit array of durations, or an object specifying a duration step size and maximum duration.

Options

params

object

Model parameters.

Properties

alpha

number

default:"0.05"

Level in the confidence intervals.

penalizer

number

default:"0.0"

Penalizer strength. Attach an L2 penalizer to the size of the coefficients during regression. This improves stability of the estimates and controls for high correlation between covariates.

Values must be in the following range:

0.0 ≤ penalizer < inf

l1_ratio

number

default:"0.0"

L1 vs L2 penalty ratio. Specify what ratio to assign to a L1 vs L2 penalty (ridge vs lasso). Same as scikit-learn convention.

Values must be in the following range:

0.0 ≤ l1_ratio ≤ 1.0

strata

array[string]

Columns to use in stratification. This is useful if a categorical covariate does not obey the proportional hazard assumption.

Array items

baseline_estimation_method

string

default:"breslow"

How the fitter should estimate the baseline.

Values must be one of the following:

breslow
spline
piecewise

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration