Categorize people into fields of occupation using their bios (biographies).
The categorization is performed using a predefined lookup-table matching certain keywords with associated fields of occupation. E.g. bios will be categorized as "journalists" if their texts contain any of the following words: "periodista", "journalist", "journalism", "periodismo", "news", "noticia", "noticias".
Possible categories currently are:
- social sciences
- art design
This step has no configuration parameters, so simply use
label_bios(ds.text) -> (ds.field_of_occupation)
The following are the step's expected inputs and outputs and their specific types.
label_bios(bios: text) -> (labels: list[category])
A column containing biographies (e.g. from social network profiles).
A column containing one or more fields of occupation for each bio.