Skip to content

Is missing

fast step  missing values • NaN

Check for missing values in a given column.

This step checks each row of the input column to determine if the value is missing (null or NaN). The result is a new boolean column, where each row indicates whether the corresponding element in the input column is missing.

The step can work with single-valued and multi-valued columns, and the output can be configured to be either boolean (true/false), numeric (0/1) or categorical (custom labels).

  • For single-valued columns: Each row in the output column will be true if the corresponding value in the input column is missing, and false otherwise.
  • For multivalued columns: Each row in the output column will be true if the corresponding sub-list in the input column is empty, and false otherwise.

Usage


The following are the step's expected inputs and outputs and their specific types.

Step signature
is_missing(column: column, {
    "param": value
}) -> (result: column)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Example

Check for missing values in a numeric column.

Example call (in recipe editor)
is_missing(ds.numeric_col) -> (ds.numeric_col_missing)
More examples

Check for missing values in a text column and set output type to numeric.

Example call (in recipe editor)
is_missing(ds.string_col, {"out_type": "number"}) -> (ds.string_col_missing)

Inputs


column: column

The input column to check for missing values.

Outputs


result: column

The output column indicating the presence of missing values in the input column.

Parameters


out_type: string = "boolean"

Output type. The data type of the output column. - 'boolean': Output is true/false indicating missing or not. - 'number': Output is 0/1 indicating missing or not. - 'category': Output is specified by params["labels"]["true"] and params["labels"]["false"].

Must be one of: "boolean", "number", "category"


labels: object

Labels for the true and false categories. An object mapping the "true" and "false" categories to custom labels.

Items in labels

True: string = True

Label for the "true" category.


False: string = False

Label for the "false" category.