The categorization is performed using a predefined lookup-table matching certain keywords with associated fields of occupation. E.g. bios will be categorized as “journalists” if their texts contain any of the following words: “periodista”, “journalist”, “journalism”, “periodismo”, “news”, “noticia”, “noticias”.

Possible categories currently are:

  • journalists
  • business
  • developers
  • marketing
  • travel
  • photography
  • university
  • seo
  • blogging
  • sports
  • politics
  • social sciences
  • medical
  • entertainment
  • art design
  • economics
  • videogames.

Usage

The following example shows how the step can be used in a recipe.

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).