aggregate_tweets_by_author
Group a dataset of tweets by author and calculate relevant author statistics.
Works like the generic aggregate
step, but with a predefined set of aggregation functions. See the ds_out
argument below
for the columns generated in the resulting dataset.
Usage
The following shows how the step can be used in a recipe.
General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.
Inputs & Outputs
The following are the inputs expected by the step and the outputs it produces. These are generally
columns (ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Configuration
The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output)
.
Whether to add rows for accounts only “mentioned” in original tweets.
If mentions or replies are recorded in the dataset (in columns mention_ids
, mention_names
and/or rp_user_id
, rp_user_name
) will add the corresponding accounts as rows in the result,
even if they didn’t have a tweet in the original dataset.
Will add mentions
and replies
columns recording how many times the accounts were
mentioned or replied to.
Column Map.
If the names of any of your dataset’s columns don’t correspond to those we expect
to find in a tweet dataset (e.g. originating in Twitter’s own API), you can provide
a mapping of of the sort {"your_column": "author_id"}
.
The expected column names are [author_id, author_handler, author_name, author_avatar, links, date, id, retweets, favorites, mention_ids, mention_names, rp_user_id, rp_user_name , text]
.
Was this page helpful?