Skip to main content

Usage

The following examples show how the step can be used in a recipe.

Examples

  • Example 1
  • Example 2
  • Signature
The following configuration fills all missing values with the string “unknown”:
replace_missing(ds.occupation, {"value": "unknown"}) -> (ds.occupation_filled)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
input
column
required
An arbitrary column, potentially containing missing values (NaN).
output
column
required
A copy of the input column where missing values have been replaced by a constant.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

value
[number, string, array[['number', 'string']]]
The constant to use to fill in missing values (normally of same type as original column). Can be a scalar value (with number or string type) or an array of values (number or string). If an array is passed it should have at least one item.
Item
[number, string]
Each item in array.
function
string
Fill missing values with the result of a given function. The following functions can be used:
  • max: substitutes the NaN values with the maximum value of a numerical column.
  • min: substitutes the NaN values with the minimum value of a numerical column.
  • mean: substitutes the NaN values with the mean of a numerical column.
  • median: substitutes the NaN values with the median of a numerical column.
  • least_freq: substitutes the NaN values with the least frequent value of a column.
  • most_freq: substitutes the NaN values with the most frequent value of a column.
  • alphabetical_first: substitutes the NaN values with the alphabetically first value of a categorical column.
  • alphabetical_first: substitutes the NaN values with the alphabetically last value of a categorical column.
  • bfill: for each NaN value, uses the next valid observation to fill it.
  • ffill: for each NaN value, propagates the last valid observation forward to fill it.
Values must be one of the following:max min mean median least_freq most_freq alphabetical_first alphabetical_last bfill ffill
I