upsample

When dealing with surveys, it’s common to want your sample to reflect a specific demographic. When this ideal representation cannot be achieved, you’d usually assign a strictly positive weight to each row reflecting how representative it is of your desired population. This step takes these precomputed weights and uses them to make the input reflect your desired population by repeating the rows a number of times in proportion to their weight until the desired image of your target population is reached within the dataset.

Usage

The following examples show how the step can be used in a recipe.

Examples

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

Outputs

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

weights

string (ds_in.column:number)

required

Name of column to be used as weights.

n_samples_min

integer

default:"1"

Number of samples given to the least weighted set of rows.

Values must be in the following range:

1 ≤ n_samples_min < inf

Prepare

Report

Analyse

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration