Explode¶
Explode (extract) items from column(s) of lists into separate rows.
Each element from an exploded list will results in a new row in the resulting dataset, i.e. the tranformation will create a taller dataset than the original, but one that has the same number of columns.
Note: to unpack lists into separate columns, see the step unpack_list
instead.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
explode(ds: dataset, {"param": value}) -> (ds_out: dataset)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Inputs¶
ds: dataset
An input dataset having at least one column containing lists.
Outputs¶
ds_out: dataset
A taller output dataset having no list columns.
Parameters¶
columns: null | string | array[string]
The list of columns to explode. Any list columns in the input dataset not mentioned here will not be exploded, i.e. will
remain list columns in the output dataset. If null
, attempts to explode all columns
containing lists.
parallel: boolean = True
Whether to explode the selected columns together. If true
, assumes all specified columns to be exploded are of the same lengths (in any
given row). In this case, if a row contains two lists with 5 elements each, this will
produce 5 rows with matching elements in the output dataset.
If false
, on the other hand, will explode iteratively column-by-column. A row containing
two lists with 5 elements each, will therefore produce 25 rows in the output dataset. I.e.,
exploding the first column will produce 5 rows, and when these rows are exploded again
using the second column, each will produce 5 rows in turn.