aggregate_tweets_by_author
Group a dataset of tweets by author and calculate relevant author statistics.
aggregate_tweets_by_author(ds_in: dataset, {
"param": value,
...
}) -> (ds_out: dataset)
Works like the generic aggregate
step, but with a predefined set of aggregation functions. See the ds_out
argument below
for the columns generated in the resulting dataset.
aggregate_tweets_by_author(ds_in: dataset, {
"param": value,
...
}) -> (ds_out: dataset)
A dataset where each row is a tweet.
Result of the aggregation, where each row is a twitter account. It will include for each author up to the following columns, depending on information present on the original dataset:
author_id
: Official Twitter IDtweet_count
: Number of tweets by this authorhandler
: Official Twitter handlename
: User namepic
: Link to user’s profile picturelinks
: A list of links mentioned by the userdates
: A list of dates of published tweets by this authortweet_ids
: The official Twitter IDs of the tweets published by the authorretweets
: The number of retweets receivedfavorites
: The number of favorites receivedmention_ids
: List of other accounts (IDs) the author has mentionedmention_names
: List of other accounts (names) the author has mentionedrp_user_ids
: List of other accounts (IDs) the author has replied torp_user_names
: List of other accounts (names) the author has replied tomentions
: The count of mentions receivedreplies
: The count of replies receivedtweet_text
: The text of the author’s tweets, concatenated.
Whether to add rows for accounts only “mentioned” in original tweets.
If mentions or replies are recorded in the dataset (in columns mention_ids
, mention_names
and/or rp_user_id
, rp_user_name
) will add the corresponding accounts as rows in the result,
even if they didn’t have a tweet in the original dataset.
Will add mentions
and replies
columns recording how many times the accounts were
mentioned or replied to.
Column Map.
If the names of any of your dataset’s columns don’t correspond to those we expect
to find in a tweet dataset (e.g. originating in Twitter’s own API), you can provide
a mapping of of the sort {"your_column": "author_id"}
.
The expected column names are [author_id, author_handler, author_name, author_avatar, links, date, id, retweets, favorites, mention_ids, mention_names, rp_user_id, rp_user_name , text]
.
Was this page helpful?
aggregate_tweets_by_author(ds_in: dataset, {
"param": value,
...
}) -> (ds_out: dataset)