Skip to main content
By default keeps only those rows where values in selected columns are not missing (non-NaNs). Using the exclude parameter, the row selection can be inverted, such that only rows with missing values in selected rows will be returned.

Usage

The following example shows how the step can be used in a recipe.

Examples

  • Example 1
  • Signature
To keep only those rows where neither “address” nor “name” is missing
filter_missing(ds, {"columns": ["address", "name"]} -> (ds_filtered)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
ds_in
dataset
required
An input dataset to filter.
ds_out
dataset
required
A dataset containing the same columns as the input dataset but including or excluding the matched rows.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

columns
array[string]
required
Names of columns used to detect and filter rows containing missing values.
Item
string (ds_in.column)
Each item in array.
exclude
boolean
default:"false"
if true, rows with non-missing values will be excluded. I.e., only rows containing missing values in the selected columns will be included in the resulting dataset.
I