Skip to content

Replace missing

fast step  missing values • NaN

Replace missing values (NaNs) with either a specified constant value or the result of a given function.


The following are the step's expected inputs and outputs and their specific types.

Step signature
replace_missing(input: column, {
    "param": value
}) -> (output: column)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.


The following configuration fills all missing values with the string "unknown":

Example call (in recipe editor)
replace_missing(ds.occupation, {"value": "unknown"}) -> (ds.occupation_filled)
More examples

The following configuration fills all missing values with the maximum of the column:

Example call (in recipe editor)
replace_missing(ds.numbers, {"function": "max"}) -> (ds.numbers_filled)


input: column

An arbitrary column, potentially containing missing values (NaN).


output: column

A copy of the input column where missing values have been replaced by a constant.


value: number | string | array

The constant to use to fill in missing values (normally of same type as original column). Can be a scalar value (with number or string type) or an array of values (number or string). If an array is passed it should have at least one item.

function: string

Fill missing values with the result of a given function. The following functions can be used:

  • max: substitutes the NaN values with the maximum value of a numerical column.
  • min: substitutes the NaN values with the minimum value of a numerical column.
  • mean: substitutes the NaN values with the mean of a numerical column.
  • median: substitutes the NaN values with the median of a numerical column.
  • least_freq: substitutes the NaN values with the least frequent value of a column.
  • most_freq: substitutes the NaN values with the most frequent value of a column.
  • alphabetical_first: substitutes the NaN values with the alphabetically first value of a categorical column.
  • alphabetical_first: substitutes the NaN values with the alphabetically last value of a categorical column.
  • bfill: for each NaN value, uses the next valid observation to fill it.
  • ffill: for each NaN value, propagates the last valid observation forward to fill it.

Must be one of: "max", "min", "mean", "median", "least_freq", "most_freq", "alphabetical_first", "alphabetical_last", "bfill", "ffill"