melt
Reshape a dataset by transforming columns into rows.
This process involves transforming a dataset by first optionally organizing its rows based on certain criteria. It then identifies unique values or combinations thereof within specified columns, creating groups based on these unique identifiers. For each group, specific functions are applied to the rows to summarize or condense their information. The result is a new dataset where each row represents a unique group, and each column corresponds to the outcome of a distinct summarization function applied across the grouped data. For more information, refer to the pandas melt documentation.
Usage
The following example shows how the step can be used in a recipe.
Examples
Examples
Given a dataset sales
with columns for date
, product_id
, and sales_amount
,
we can use melt to transform this dataset into a long format.
This transformation will create a new dataset long_sales
where each row represents a single observation of sales amount for a product on a given date,
facilitating further analysis or visualization.
The resulting dataset long_sales
will have the following columns:
date
: the date of the observationvariable
: indicating the product by itsproduct_id
value
: the sales amount for that product on the given date
Given a dataset sales
with columns for date
, product_id
, and sales_amount
,
we can use melt to transform this dataset into a long format.
This transformation will create a new dataset long_sales
where each row represents a single observation of sales amount for a product on a given date,
facilitating further analysis or visualization.
The resulting dataset long_sales
will have the following columns:
date
: the date of the observationvariable
: indicating the product by itsproduct_id
value
: the sales amount for that product on the given date
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
Outputs
Outputs
Result of melting ds.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Identifier columns. The column(s) to use as identifier variables.
Array items
Array items
Each item in array.
Examples
Examples
- order_id
- [‘customer_id’, ‘order_id’]
Value columns. The column(s) that are considered as value variables.
Array items
Array items
Each item in array.
Examples
Examples
- quantity
- [‘price’, ‘quantity’, ‘discount’]
Variable column name. Name of the variable column if not provided, we will use the name ‘variable’.
Value column name. Name of the value column if not provided, we will use the name ‘value’.