Infer language¶
NLP • inference • model • text
Detect the language used for each text in the input column.
Each language will be represented by its ISO 639-1 language code, such as "en", "es", "it" for English, Spanish and Italian respectively.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
infer_language(text: text, {"param": value}) -> (lang: category)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
In most cases no special configuration should be necessary, so simply
infer_language(ds.text) -> (ds.language)
Inputs¶
text: column:text
A text column to detect languages for.
Outputs¶
lang: column:category
A column identifying the language of each text using its two-letter ISO 639-1 language code.
Parameters¶
min_probability: number = 0.5
Minimum probability to assign a language for a particular text. If the model used to infer the language is less sure about a language than this, the corresponding text will be assigned no language, and will have a missing values instead (NaN).
Range: 0 ≤ min_probability ≤ 1
allowed_languages: array[string] = ['en', 'es', 'pt', 'fr', 'de', 'it', 'eu', 'ca', 'tr', 'ar']
Only these languages will be inferred. Can be used to limit language detection to a smaller set if necessary.
Items in allowed_languages
item: string
Must be one of:
"en"
,
"es"
,
"pt"
,
"fr"
,
"de"
,
"it"
,
"eu"
,
"ca"
,
"tr"
,
"ar"