Skip to main content
When dealing with surveys, it’s common to want your sample to reflect a specific demographic. When this ideal representation cannot be achieved, you’d usually assign a strictly positive weight to each row reflecting how representative it is of your desired population. This step takes these precomputed weights and uses them to make the input reflect your desired population by repeating the rows a number of times in proportion to their weight until the desired image of your target population is reached within the dataset.

Usage

The following examples show how the step can be used in a recipe.

Examples

  • Example 1
  • Example 2
  • Signature
The following example creates a new dataset with the proportions specified by weight_name
upsample(ds, {"weights": "weight_name"}) -> (ds_upsampled)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
ds_in
dataset
required
An input dataset to upsample.
ds_out
dataset
required
A new dataset containing the desired proportions.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

weights
string (ds_in.column:number)
required
Name of column to be used as weights.
n_samples_min
integer
default:"1"
Number of samples given to the least weighted set of rows.Values must be in the following range:
1n_samples_min < inf
I