clean_categories
Clean a given column of categories or lists of categories using OpenAI.
Usage
The following example shows how the step can be used in a recipe.
Examples
Examples
Bring the number of categories down to 10 categories
Bring the number of categories down to 10 categories
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
original column.
Outputs
Outputs
cleaned column.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Associated integration.
Categories desired in the result column. If passed instructions are ignored.
Array items
Array items
Each item in array.
Further instructions to generate the desired set of categories. You can ask for things like ‘generate around 5 categories’
Approximate number of categories to generate. Can be a float from 0 to 1 (percentage of unique categories) or an integer greater than 1.
Model Configuration. Configuration for OpenAI’s model.
Properties
Properties
OpenAI model to choose.
Values must be one of the following:
gpt-4o
gpt-4o-mini
o3-mini
gpt-4.1
gpt-4.1-mini
gpt-4.1-nano
Temperature. Higher means more creativity, but also makes the model more likely to hallucinate. Lower temperature yields more deterministic results.
Values must be in the following range: