Skip to content

Explode

Explode (extract) items from column(s) of lists into separate rows.

Each element from an exploded list will results in a new row in the resulting dataset, i.e. the tranformation will create a taller dataset than the original, but one that has the same number of columns. Note: to unpack lists into separate columns, see the step unpack_list instead.

Usage


The following are the step's expected inputs and outputs and their specific types.

Step signature
explode(ds: dataset, {
    "param": value
}) -> (ds_out: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Inputs


ds: dataset

An input dataset having at least one column containing lists.

Outputs


ds_out: dataset

A taller output dataset having no list columns.

Parameters


columns: null | string | array[string]

The list of columns to explode. Any list columns in the input dataset not mentioned here will not be exploded, i.e. will remain list columns in the output dataset. If null, attempts to explode all columns containing lists.


parallel: boolean = True

Whether to explode the selected columns together. If true, assumes all specified columns to be exploded are of the same lengths (in any given row). In this case, if a row contains two lists with 5 elements each, this will produce 5 rows with matching elements in the output dataset.

If false, on the other hand, will explode iteratively column-by-column. A row containing two lists with 5 elements each, will therefore produce 25 rows in the output dataset. I.e., exploding the first column will produce 5 rows, and when these rows are exploded again using the second column, each will produce 5 rows in turn.