replace_missing
Replace missing values (NaNs) with either a specified constant value or the result of a given function.
Usage
The following examples show how the step can be used in a recipe.
The following configuration fills all missing values with the string “unknown”:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
The constant to use to fill in missing values (normally of same type as original column). Can be a scalar value (with number or string type) or an array of values (number or string). If an array is passed it should have at least one item.
Fill missing values with the result of a given function. The following functions can be used:
- max: substitutes the NaN values with the maximum value of a numerical column.
- min: substitutes the NaN values with the minimum value of a numerical column.
- mean: substitutes the NaN values with the mean of a numerical column.
- median: substitutes the NaN values with the median of a numerical column.
- least_freq: substitutes the NaN values with the least frequent value of a column.
- most_freq: substitutes the NaN values with the most frequent value of a column.
- alphabetical_first: substitutes the NaN values with the alphabetically first value of a categorical column.
- alphabetical_first: substitutes the NaN values with the alphabetically last value of a categorical column.
- bfill: for each NaN value, uses the next valid observation to fill it.
- ffill: for each NaN value, propagates the last valid observation forward to fill it.
Values must be one of the following:
max
min
mean
median
least_freq
most_freq
alphabetical_first
alphabetical_last
bfill
ffill
Was this page helpful?