Upsample a dataset given a weight column.
When dealing with surveys, it’s common to want your sample to reflect a specific demographic. When this ideal representation cannot be achieved, you’d usually assign a strictly positive weight to each row reflecting how representative it is of your desired population. This step takes these precomputed weights and uses them to make the input reflect your desired population by repeating the rows a number of times in proportion to their weight until the desired image of your target population is reached within the dataset.
The following examples show how the step can be used in a recipe.
Examples
The following example creates a new dataset with the proportions specified by weight_name
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
An input dataset to upsample.
Outputs
A new dataset containing the desired proportions.
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.