Upsample¶
Upsample a dataset given a weight column.
When dealing with surveys, it's common to want your sample to reflect a specific demographic. When this ideal representation cannot be achieved, you'd usually assign a strictly positive weight to each row reflecting how representative it is of your desired population. This step takes these precomputed weights and uses them to make the input reflect your desired population by repeating the rows a number of times in proportion to their weight until the desired image of your target population is reached within the dataset.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
upsample(ds_in: dataset, {"param": value}) -> (ds_out: dataset)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
The following example creates a new dataset with the proportions specified by weight_name
upsample(ds, {"weights": "weight_name"}) -> (ds_upsampled)
More examples
Same as before, but ensures 3 occurences at least for the least weighted row.
upsample(ds, {"weights": "weight_name", n_samples_min: 3}) -> (ds_upsampled)
Inputs¶
ds_in: dataset
An input dataset to upsample.
Outputs¶
ds_out: dataset
A new dataset containing the desired proportions.
Parameters¶
weights: string
Name of column to be used as weights.
n_samples_min: integer = 1
Number of samples given to the least weighted set of rows.
Range: 1 ≤ n_samples_min < inf