Replace missing¶
fast step missing values • NaN
Replace missing values (NaNs) with either a specified constant value or the result of a given function.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
replace_missing(input: column, {"param": value}) -> (output: column)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
The following configuration fills all missing values with the string "unknown":
replace_missing(ds.occupation, {"value": "unknown"}) -> (ds.occupation_filled)
More examples
The following configuration fills all missing values with the maximum of the column:
replace_missing(ds.numbers, {"function": "max"}) -> (ds.numbers_filled)
Inputs¶
input: column
An arbitrary column, potentially containing missing values (NaN).
Outputs¶
output: column
A copy of the input column where missing values have been replaced by a constant.
Parameters¶
value: number | string | null | array
The constant to use to fill in missing values (normally of same type as original column).
function: string
Fill missing values with the result of a given function. The following functions can be used:
- max: substitutes the NaN values with the maximum value of a numerical column.
- min: substitutes the NaN values with the minimum value of a numerical column.
- mean: substitutes the NaN values with the mean of a numerical column.
- median: substitutes the NaN values with the median of a numerical column.
- least_freq: substitutes the NaN values with the least frequent value of a column.
- most_freq: substitutes the NaN values with the most frequent value of a column.
- alphabetical_first: substitutes the NaN values with the alphabetically first value of a categorical column.
- alphabetical_first: substitutes the NaN values with the alphabetically last value of a categorical column.
- bfill: for each NaN value, uses the next valid observation to fill it.
- ffill: for each NaN value, propagates the last valid observation forward to fill it.
Must be one of:
"max"
,
"min"
,
"mean"
,
"median"
,
"least_freq"
,
"most_freq"
,
"alphabetical_first"
,
"alphabetical_last"
,
"bfill"
,
"ffill"