Skip to content

Label bios

NLPsocialtext

Categorize people into fields of occupation using their bios (biographies).

The categorization is performed using a predefined lookup-table matching certain keywords with associated fields of occupation. E.g. bios will be categorized as "journalists" if their texts contain any of the following words: "periodista", "journalist", "journalism", "periodismo", "news", "noticia", "noticias".

Possible categories currently are:

  • journalists
  • business
  • developers
  • marketing
  • travel
  • photography
  • university
  • seo
  • blogging
  • sports
  • politics
  • social sciences
  • medical
  • entertainment
  • art design
  • economics
  • videogames.

Usage


The following are the step's expected inputs and outputs and their specific types.

Step signature
label_bios(bios: text) -> (labels: list[category])

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Example

This step has no configuration parameters, so simply use

Example call (in recipe editor)
label_bios(ds.text) -> (ds.field_of_occupation)

Inputs


bios: column:text

A column containing biographies (e.g. from social network profiles).

Outputs


labels: column:list[category]

A column containing one or more fields of occupation for each bio.