add_noise
Add noise to a column with numbers or lists of numbers.
Given a distribution name with a scale and loc parameters,
the step optionally applies another scaling to it either based on the standard deviation of the column or a proportionally to each point through
the relative
parameter in order to preserve the underlying structure of the data. Then the computation is carried as follows:
new value = original value + relative scaling factor * random sample from the distribution.
If this relative
parameter is not given or is set to abs
, then the relative scaling factor is 1.
Usage
The following examples show how the step can be used in a recipe.
Add white noise to a column of embeddings
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Mode to use. Either set to “std” to use the standard deviation, or use a number to scale the sampling.
Distribution Function that noise is sampled from.
Values must be one of the following:
gumbel
laplace
logistic
normal
Mean (“centre”) of the chosen distribution.
Standard deviation (spread or “width”) of the distribution.
The seed to use for the random distribution, if you wish to get reproducibility in your results.
Was this page helpful?