Skip to content

Upsample

Upsample a dataset given a weight column.

When dealing with surveys, it's common to want your sample to reflect a specific demographic. When this ideal representation cannot be achieved, you'd usually assign a strictly positive weight to each row reflecting how representative it is of your desired population. This step takes these precomputed weights and uses them to make the input reflect your desired population by repeating the rows a number of times in proportion to their weight until the desired image of your target population is reached within the dataset.

Usage


The following are the step's expected inputs and outputs and their specific types.

Step signature
upsample(ds_in: dataset, {
    "param": value
}) -> (ds_out: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Example

The following example creates a new dataset with the proportions specified by weight_name

Example call (in recipe editor)
upsample(ds, {"weights": "weight_name"}) -> (ds_upsampled)
More examples

Same as before, but ensures 3 occurences at least for the least weighted row.

Example call (in recipe editor)
upsample(ds, {"weights": "weight_name", n_samples_min: 3}) -> (ds_upsampled)

Inputs


ds_in: dataset

An input dataset to upsample.

Outputs


ds_out: dataset

A new dataset containing the desired proportions.

Parameters


weights: string

Name of column to be used as weights.


n_samples_min: integer = 1

Number of samples given to the least weighted set of rows.

Range: 1 ≤ n_samples_min < inf