Skip to main content
This step calculates the duration between a start date and an end date and determines whether an event was observed. The output consists of two columns:
  • duration: The time interval between the start and end dates in the specified unit (default: days).
  • observed: A boolean column indicating whether the event was observed (i.e., if the end date occurs before the observation date).
This is particularly useful for preparing input data for survival analysis, such as Kaplan-Meier curves, where the event observation (censoring) status and duration are key inputs.
  • If either start_date or end_date is missing (null), observed will be false, and duration will be null.
  • Otherwise, the duration is calculated as the interval between start_date and end_date.
  • If end_date is not null, observed will be true if end_date <= observation_end; otherwise, it will be false.

Usage

The following examples show how the step can be used in a recipe.

Examples

  • Example 1
  • Example 2
  • Signature
Calculate the duration and observation status between a start date and end date.
observed_duration(ds.start_date, ds.end_date, {"observation_end": "2020-01-01"}) -> (ds.duration, ds.observed)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
start_date
column[date]
required
The column containing the start date for each entry.
end_date
column[date]
required
The column containing the end date for each entry (can be null for ongoing cases).
duration
column[number]
required
The calculated duration between the start and end dates in the specified unit.
observed
column[boolean]
required
A boolean column indicating if the event was observed before the observation_end.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

observation_end
string
required
Observation end date. The cutoff date to determine if the event was observed (e.g., churn or any other event).
absolute
boolean
default:"false"
Absolute value for duration. Whether to use absolute values for the calculated duration.
unit
string
default:"days"
Unit for duration. The unit of measurement for the duration. Allowed values are: - “Y”, “year” - “Q”, “quarter” - “M”, “month” - “W”, “week” - “D”, “day” - “h”, “hour” - “m”, “minute” - “s”, “second” - “ms”, “millisecond” The unit name can be spelled in singular or plural and is case-insensitive.Values must be one of the following:Y year Year years Years Q quarter Quarter quarters Quarters M month Month months Months W week Week weeks Weeks D day Day days Days h hour Hour hours Hours m minute Minute minutes Minutes s second Second seconds Seconds ms millisecond Millisecond milliseconds Milliseconds
I