upsample
Upsample a dataset given a weight column.
When dealing with surveys, it’s common to want your sample to reflect a specific demographic. When this ideal representation cannot be achieved, you’d usually assign a strictly positive weight to each row reflecting how representative it is of your desired population. This step takes these precomputed weights and uses them to make the input reflect your desired population by repeating the rows a number of times in proportion to their weight until the desired image of your target population is reached within the dataset.
Usage
The following examples show how the step can be used in a recipe.
The following example creates a new dataset with the proportions specified by weight_name
The following example creates a new dataset with the proportions specified by weight_name
Same as before, but ensures 3 occurences at least for the least weighted row.
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Was this page helpful?