Skip to content

Label bios

NLP · social · text

Categorize people into fields of occupation using their bios (biographies).

The categorization is performed using a predefined lookup-table matching certain keywords with associated fields of occupation. E.g. bios will be categorized as "journalists" if their texts contain any of the following words: "periodista", "journalist", "journalism", "periodismo", "news", "noticia", "noticias".

Possible categories currently are:

  • journalists
  • business
  • developers
  • marketing
  • travel
  • photography
  • university
  • seo
  • blogging
  • sports
  • politics
  • social sciences
  • medical
  • entertainment
  • art design
  • economics
  • videogames.

Example

This step has no configuration parameters, so simply use

label_bios(ds.text) -> (ds.field_of_occupation)

Usage

The following are the step's expected inputs and outputs and their specific types.

label_bios(bios: text) -> (labels: list[category])

Inputs


bios: column:text

A column containing biographies (e.g. from social network profiles).

Outputs


labels: column:list[category]

A column containing one or more fields of occupation for each bio.