Add noise¶
math
Add noise to a column with numbers or lists of numbers.
Given a distribution name with a scale and loc parameters,
the step optionally applies another scaling to it either based on the standard deviation of the column or a proportionally to each point through
the relative
parameter in order to preserve the underlying structure of the data. Then the computation is carried as follows:
new value = original value + relative scaling factor * random sample from the distribution.
If this relative
parameter is not given or is set to abs
, then the relative scaling factor is 1.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
add_noise(input_column: number|list[number], {"param": value}) -> (result: column)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
Add white noise to a column of embeddings
add_noise(ds.embeddings) -> (ds.embeddings_with_noise)
More examples
Add std-dependant noise to a numerical column
add_noise(ds.number, {"relative": "std"}) -> (ds.number_with_noise)
Inputs¶
input_column: column:number|list[number]
The original column.
Outputs¶
result: column
The result of applying noise to it.
Parameters¶
relative: number | string = "abs"
Mode to use. Either set to "std" to use the standard deviation, or use a number to scale the sampling.
Must be one of:
"std"
,
"abs"
dist_name: string = "normal"
Distribution Function that noise is sampled from.
Must be one of:
"gumbel"
,
"laplace"
,
"logistic"
,
"normal"
loc: number = 0.0
Mean ("centre") of the chosen distribution.
scale: number = 1.0
Standard deviation (spread or "width") of the distribution.
seed: number | null
The seed to use for the random distribution, if you wish to get reproducibility in your results.