Skip to content

Filter missing

missing data ยท NaN

Filter rows based on missing values in one or more columns.

By default keeps only those rows where values in selected columns are not missing (non-NaNs). Using the exclude parameter, the row selection can be inverted, such that only rows with missing values in selected rows will be returned.


To keep only those rows where neither "address" nor "name" is missing

filter_missing(ds, {"columns": ["address", "name"]} -> (ds_filtered)


The following are the step's expected inputs and outputs and their specific types.

filter_missing(ds_in: dataset, {"param": value}) -> (ds_out: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.


ds_in: dataset

An input dataset to filter.


ds_out: dataset

A dataset containing the same columns as the input dataset but including or excluding the matched rows.


columns: array[string]

Names of columns used to detect and filter rows containing missing values.

Items in columns

item: string

exclude: boolean = False

if true, rows with non-missing values will be excluded. I.e., only rows containing missing values in the selected columns will be included in the resulting dataset.