Usage
The following examples show how the step can be used in a recipe.Examples
Examples
- Example 1
- Example 2
- Example 3
- Signature
This draws a sample of 12.000 random rows from the original dataset:
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Inputs
An input dataset to filter.
Outputs
Outputs
A new dataset containing a random sample of the original rows.
Configuration
The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e.step(..., {"param": "value", ...}) -> (output)
.
Parameters
Parameters
Number of rows to sample.
How many random rows to pick from the original dataset (without replacement). If the value is greater than 1,
it will be interpreted as a count of desired rows. If it is smaller than 1, it will be interpreted as a proportion
of the entire dataset.
Options
Options
- number
- integer
number.Values must be in the following range:
Sample independently in these groups.
If a column is specified here, the sampling will be applied separately within each group defined by the unique
values in this column. Combining this with a count of rows to pick (rather than a proportion), allows this step
to balance the dataset, leading to an (approximately) equal number of rows within each group.
A value used to initialize the random number generator, making it deterministic (reproducible).