Skip to content

Replace missing

missing values ยท NaN

Replace missing values (NaNs) with either a specified constant value or the result of a given function.

Example

The following configuration fills all missing values with the string "unknown":

replace_missing(ds.occupation, {"value": "unknown"}) -> (ds.occupation_filled)
More examples

The following configuration fills all missing values with the maximum of the column:

replace_missing(ds.numbers, {"function": "max"}) -> (ds.numbers_filled)

Usage

The following are the step's expected inputs and outputs and their specific types.

replace_missing(input: column, {"param": value}) -> (output: column)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Inputs


input: column

An arbitrary column, potentially containing missing values (NaN).

Outputs


output: column

A copy of the input column where missing values have been replaced by a constant.

Parameters


value: number | string | null | array

The constant to use to fill in missing values (normally of same type as original column).


function: string

Fill missing values with the result of a given function. The following functions can be used:

  • max: substitutes the NaN values with the maximum value of a numerical column.
  • min: substitutes the NaN values with the minimum value of a numerical column.
  • mean: substitutes the NaN values with the mean of a numerical column.
  • median: substitutes the NaN values with the median of a numerical column.
  • least_freq: substitutes the NaN values with the least frequent value of a column.
  • most_freq: substitutes the NaN values with the most frequent value of a column.
  • alphabetical_first: substitutes the NaN values with the alphabetically first value of a categorical column.
  • alphabetical_first: substitutes the NaN values with the alphabetically last value of a categorical column.
  • bfill: for each NaN value, uses the next valid observation to fill it.
  • ffill: for each NaN value, propagates the last valid observation forward to fill it.

Must be one of: "max", "min", "mean", "median", "least_freq", "most_freq", "alphabetical_first", "alphabetical_last", "bfill", "ffill"