Label bios

NLP · social · text

Categorize people into fields of occupation using their bios (biographies).

The categorization is performed using a predefined lookup-table matching certain keywords with associated fields of occupation. E.g. bios will be categorized as "journalists" if their texts contain any of the following words: "periodista", "journalist", "journalism", "periodismo", "news", "noticia", "noticias".

Possible categories currently are:

  • journalists
  • business
  • developers
  • marketing
  • travel
  • photography
  • university
  • seo
  • blogging
  • sports
  • politics
  • social sciences
  • medical
  • entertainment
  • art design
  • economics
  • videogames.


This step has no configuration parameters, so simply use

label_bios(ds.text) -> (ds.field_of_occupation)


The following are the step's expected inputs and outputs and their specific types.

label_bios(bios: text) -> (labels: list[category])


bios: column:text

A column containing biographies (e.g. from social network profiles).


labels: column:list[category]

A column containing one or more fields of occupation for each bio.