explode
Explode (extract) items from column(s) of lists into separate rows.
Each element from an exploded list will results in a new row in the resulting dataset, i.e. the tranformation will create a taller dataset than the original, but one that has the same number of columns.
Note: to unpack lists into separate columns, see the step unpack_list
instead.
Usage
The following shows how the step can be used in a recipe.
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
The list of columns to explode.
Any list columns in the input dataset not mentioned here will not be exploded, i.e. will
remain list columns in the output dataset. If null
, attempts to explode all columns
containing lists.
Columns to keep in the output dataset.
Specifies which non-exploded columns should be included in the output dataset. If null
(default),
all non-exploded columns will be included. If a string, only that column will be included. If an array
of strings, only those columns will be included. Note that columns specified in explode_by
will always
be included regardless of this parameter.
Whether to explode the selected columns together.
If true
, assumes all specified columns to be exploded are of the same lengths (in any
given row). In this case, if a row contains two lists with 5 elements each, this will
produce 5 rows with matching elements in the output dataset.
If false
, on the other hand, will explode iteratively column-by-column. A row containing
two lists with 5 elements each, will therefore produce 25 rows in the output dataset. I.e.,
exploding the first column will produce 5 rows, and when these rows are exploded again
using the second column, each will produce 5 rows in turn.
The graphext advanced query used to identify the rows to select previous to the grouping.
Was this page helpful?