Skip to main content
Given a distribution name with a scale and loc parameters, the step optionally applies another scaling to it either based on the standard deviation of the column or a proportionally to each point through the relative parameter in order to preserve the underlying structure of the data. Then the computation is carried as follows: new value = original value + relative scaling factor * random sample from the distribution. If this relative parameter is not given or is set to abs, then the relative scaling factor is 1.

Usage

The following examples show how the step can be used in a recipe.

Examples

  • Example 1
  • Example 2
  • Signature
Add white noise to a column of embeddings
add_noise(ds.embeddings) -> (ds.embeddings_with_noise)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").
input_column
column[number|list[number]]
required
The original column.
result
column
required
The result of applying noise to it.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

relative
[number, string]
default:"abs"
Mode to use. Either set to “std” to use the standard deviation, or use a number to scale the sampling.
  • number
  • string
{_}
number
number.
dist_name
string
default:"normal"
Distribution Function that noise is sampled from.Values must be one of the following:
  • gumbel
  • laplace
  • logistic
  • normal
loc
number
default:"0.0"
Mean (“centre”) of the chosen distribution.
scale
number
default:"1.0"
Standard deviation (spread or “width”) of the distribution.
seed
[number, null]
The seed to use for the random distribution, if you wish to get reproducibility in your results.
I