replace_missing
Replace missing values (NaNs) with either a specified constant value or the result of a given function.
input
column
requiredAn arbitrary column, potentially containing missing values (NaN).
output
column
requiredA copy of the input column where missing values have been replaced by a constant.
value
[number, string, array[['number', 'string']]]
The constant to use to fill in missing values (normally of same type as original column). Can be a scalar value (with number or string type) or an array of values (number or string). If an array is passed it should have at least one item.
function
string
Fill missing values with the result of a given function. The following functions can be used:
- max: substitutes the NaN values with the maximum value of a numerical column.
- min: substitutes the NaN values with the minimum value of a numerical column.
- mean: substitutes the NaN values with the mean of a numerical column.
- median: substitutes the NaN values with the median of a numerical column.
- least_freq: substitutes the NaN values with the least frequent value of a column.
- most_freq: substitutes the NaN values with the most frequent value of a column.
- alphabetical_first: substitutes the NaN values with the alphabetically first value of a categorical column.
- alphabetical_first: substitutes the NaN values with the alphabetically last value of a categorical column.
- bfill: for each NaN value, uses the next valid observation to fill it.
- ffill: for each NaN value, propagates the last valid observation forward to fill it.
Values must be one of the following:
max
min
mean
median
least_freq
most_freq
alphabetical_first
alphabetical_last
bfill
ffill
Was this page helpful?