Skip to content

Aggregate tweets by author

group by

Group a dataset of tweets by author and calculate relevant author statistics.

Works like the generic aggregate step, but with a predefined set of aggregation functions. See the ds_out argument below for the columns generated in the resulting dataset.

Usage

The following are the step's expected inputs and outputs and their specific types.

aggregate_tweets_by_author(ds_in: dataset, {"param": value}) -> (ds_out: dataset)

where the object {"param": value} is optional in most cases and if present may contain any of the parameters described in the corresponding section below.

Inputs


ds_in: dataset

A dataset where each row is a tweet.

Outputs


ds_out: dataset

Result of the aggregation, where each row is a twitter account. It will include for each author up to the following columns, depending on information present on the original dataset:

  • author_id: Official Twitter ID
  • tweet_count: Number of tweets by this author
  • handler: Official Twitter handle
  • name: User name
  • pic: Link to user's profile picture
  • links: A list of links mentioned by the user
  • dates: A list of dates of published tweets by this author
  • tweet_ids: The official Twitter IDs of the tweets published by the author
  • retweets: The number of retweets received
  • favorites: The number of favorites received
  • mention_ids: List of other accounts (IDs) the author has mentioned
  • mention_names: List of other accounts (names) the author has mentioned
  • rp_user_ids: List of other accounts (IDs) the author has replied to
  • rp_user_names: List of other accounts (names) the author has replied to
  • mentions: The count of mentions received
  • replies: The count of replies received.

Parameters


add_referenced_accounts: boolean = True

Add Referenced Accounts. Tries to add referenced accounts as new rows if conditions are met. Needs either "mention_ids" and "mention_names" or "rp_user_id" and "rp_user_name". Adds "mentions" column if "mention_ids" and "mention_names" are provided. Adds "replies" column if "rp_user_id" and "rp_user_name" are provided.


column_map: object

Column Map. If any of your dataset's columns are not in our expected tweet dataset format ,e.g: author_id, author_handler, author_name, author_avatar, links, date, id, retweets, favorites, mention_ids, mention_names, rp_user_id, rp_user_name and text, you can provide a mapping of of the sort {"your_column": "author_id"}.

Items in column_map