Skip to content

Filter missing

missing data ยท NaN

Filter rows based on missing values in one or more columns.

By default keeps only those rows where values in selected columns are not missing (non-NaNs). Using the exclude parameter, the row selection can be inverted, such that only rows with missing values in selected rows will be returned.

Example

To keep only those rows where neither "address" nor "name" is missing

filter_missing(ds, {"columns": ["address", "name"]} -> (ds_filtered)

Usage

The following are the step's expected inputs and outputs and their specific types.

filter_missing(ds_in: dataset, {"param": value}) -> (ds_out: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Inputs


ds_in: dataset

An input dataset to filter.

Outputs


ds_out: dataset

A dataset containing the same columns as the input dataset but including or excluding the matched rows.

Parameters


columns: array[string]

Names of columns used to detect and filter rows containing missing values.

Items in columns

item: string


exclude: boolean = False

if true, rows with non-missing values will be excluded. I.e., only rows containing missing values in the selected columns will be included in the resulting dataset.