Label bios¶
NLP • social • text
Categorize people into fields of occupation using their bios (biographies).
The categorization is performed using a predefined lookup-table matching certain keywords with associated fields of occupation. E.g. bios will be categorized as "journalists" if their texts contain any of the following words: "periodista", "journalist", "journalism", "periodismo", "news", "noticia", "noticias".
Possible categories currently are:
- journalists
- business
- developers
- marketing
- travel
- photography
- university
- seo
- blogging
- sports
- politics
- social sciences
- medical
- entertainment
- art design
- economics
- videogames.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
label_bios(bios: text) -> (labels: list[category])
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Example¶
This step has no configuration parameters, so simply use
label_bios(ds.text) -> (ds.field_of_occupation)
Inputs¶
bios: column:text
A column containing biographies (e.g. from social network profiles).
Outputs¶
labels: column:list[category]
A column containing one or more fields of occupation for each bio.