Aggregate tweets by author¶
group by
Group a dataset of tweets by author and calculate relevant author statistics.
Works like the generic aggregate
step, but with a predefined set of aggregation functions. See the ds_out
argument below
for the columns generated in the resulting dataset.
Usage¶
The following are the step's expected inputs and outputs and their specific types.
aggregate_tweets_by_author(ds_in: dataset, {"param": value}) -> (ds_out: dataset)
where the object {"param": value}
is optional in most cases and if present may contain any of the parameters described in the
corresponding section below.
Inputs¶
ds_in: dataset
A dataset where each row is a tweet.
Outputs¶
ds_out: dataset
Result of the aggregation, where each row is a twitter account. It will include for each author up to the following columns, depending on information present on the original dataset:
author_id
: Official Twitter IDtweet_count
: Number of tweets by this authorhandler
: Official Twitter handlename
: User namepic
: Link to user's profile picturelinks
: A list of links mentioned by the userdates
: A list of dates of published tweets by this authortweet_ids
: The official Twitter IDs of the tweets published by the authorretweets
: The number of retweets receivedfavorites
: The number of favorites receivedmention_ids
: List of other accounts (IDs) the author has mentionedmention_names
: List of other accounts (names) the author has mentionedrp_user_ids
: List of other accounts (IDs) the author has replied torp_user_names
: List of other accounts (names) the author has replied tomentions
: The count of mentions receivedreplies
: The count of replies receivedtweet_text
: The text of the author's tweets, concatenated.
Parameters¶
add_referenced_accounts: boolean = True
Whether to add rows for accounts only "mentioned" in original tweets. If mentions or replies are recorded in the dataset (in columns mention_ids
, mention_names
and/or rp_user_id
, rp_user_name
) will add the corresponding accounts as rows in the result,
even if they didn't have a tweet in the original dataset.
Will add mentions
and replies
columns recording how many times the accounts were
mentioned or replied to.
column_map: object
Column Map. If the names of any of your dataset's columns don't correspond to those we expect to find in a tweet dataset (e.g. originating in Twitter's own API), you can provide a mapping of of the sort {"your_column": "author_id"}.
The expected column names are [author_id, author_handler, author_name, author_avatar,
links, date, id, retweets, favorites, mention_ids, mention_names, rp_user_id, rp_user_name
, text]