Skip to main content
This process involves transforming a dataset by first optionally organizing its rows based on certain criteria. It then identifies unique values or combinations thereof within specified columns, creating groups based on these unique identifiers. For each group, specific functions are applied to the rows to summarize or condense their information. The result is a new dataset where each row represents a unique group, and each column corresponds to the outcome of a distinct summarization function applied across the grouped data. For more information, refer to the pandas melt documentation.

Usage

The following example shows how the step can be used in a recipe.

Examples

  • Example 1
  • Signature
Given a dataset sales with columns for date, product_id, and sales_amount, we can use melt to transform this dataset into a long format. This transformation will create a new dataset long_sales where each row represents a single observation of sales amount for a product on a given date, facilitating further analysis or visualization.The resulting dataset long_sales will have the following columns:
  • date: the date of the observation
  • variable: indicating the product by its product_id
  • value: the sales amount for that product on the given date
melt(sales, {
  {
    "id_vars": ["date"], 
    "value_vars": ["product_id", "sales_amount"], 
    "var_name": "variable", 
    "value_name": "value"
  } -> (long_sales)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
ds_in
dataset
required
melted
dataset
required
Result of melting ds.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

id_vars
[string, array[string]]
required
Identifier columns. The column(s) to use as identifier variables.
Item
string (ds_in.column)
Each item in array.
  • order_id
  • [‘customer_id’, ‘order_id’]
value_vars
[string, array[string]]
required
Value columns. The column(s) that are considered as value variables.
Item
string (ds_in.column)
Each item in array.
  • quantity
  • [‘price’, ‘quantity’, ‘discount’]
var_name
string
default:"variable"
Variable column name. Name of the variable column if not provided, we will use the name ‘variable’.
value_name
string
default:"value"
Value column name. Name of the value column if not provided, we will use the name ‘value’.
I