Skip to content

Filter missing

missing dataNaN

Filter rows based on missing values in one or more columns.

By default keeps only those rows where values in selected columns are not missing (non-NaNs). Using the exclude parameter, the row selection can be inverted, such that only rows with missing values in selected rows will be returned.


The following are the step's expected inputs and outputs and their specific types.

Step signature
filter_missing(ds_in: dataset, {
    "param": value
}) -> (ds_out: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.


To keep only those rows where neither "address" nor "name" is missing

Example call (in recipe editor)
filter_missing(ds, {"columns": ["address", "name"]} -> (ds_filtered)


ds_in: dataset

An input dataset to filter.


ds_out: dataset

A dataset containing the same columns as the input dataset but including or excluding the matched rows.


columns: array[string]

Names of columns used to detect and filter rows containing missing values.

Items in columns

item: string

exclude: boolean = False

if true, rows with non-missing values will be excluded. I.e., only rows containing missing values in the selected columns will be included in the resulting dataset.