Skip to content

Add noise

math

Add noise to a column with numbers or lists of numbers.

Given a distribution name with a scale and loc parameters, the step optionally applies another scaling to it either based on the standard deviation of the column or a proportionally to each point through the relative parameter in order to preserve the underlying structure of the data. Then the computation is carried as follows:

new value = original value + relative scaling factor * random sample from the distribution.

If this relative parameter is not given or is set to abs, then the relative scaling factor is 1.

Usage


The following are the step's expected inputs and outputs and their specific types.

Step signature
add_noise(input_column: number|list[number], {
    "param": value
}) -> (result: column)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Example

Add white noise to a column of embeddings

Example call (in recipe editor)
add_noise(ds.embeddings) -> (ds.embeddings_with_noise)
More examples

Add std-dependant noise to a numerical column

Example call (in recipe editor)
add_noise(ds.number, {"relative": "std"}) -> (ds.number_with_noise)

Inputs


input_column: column:number|list[number]

The original column.

Outputs


result: column

The result of applying noise to it.

Parameters


relative: number | string = "abs"

Mode to use. Either set to "std" to use the standard deviation, or use a number to scale the sampling.

Must be one of: "std", "abs"


dist_name: string = "normal"

Distribution Function that noise is sampled from.

Must be one of: "gumbel", "laplace", "logistic", "normal"


loc: number = 0.0

Mean ("centre") of the chosen distribution.


scale: number = 1.0

Standard deviation (spread or "width") of the distribution.


seed: number | null

The seed to use for the random distribution, if you wish to get reproducibility in your results.