> ## Documentation Index > Fetch the complete documentation index at: https://docs.graphext.com/llms.txt > Use this file to discover all available pages before exploring further. # aggregate_tweets_by_author > Group a dataset of tweets by author and calculate relevant author statistics. Works like the generic `aggregate` step, but with a predefined set of aggregation functions. See the `ds_out` argument below for the columns generated in the resulting dataset. ## Usage The following examples show how the step can be used in a recipe. Aggregate tweets by author using standard Twitter column names ```stan theme={null} aggregate_tweets_by_author(ds) -> (ds_authors) ``` Aggregate with custom column mapping ```stan theme={null} aggregate_tweets_by_author(ds, {"add_referenced_accounts": false, "column_map": {"user_id": "author_id", "screen_name": "author_handler"}}) -> (ds_authors) ``` General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below. ```stan theme={null} aggregate_tweets_by_author(ds_in: dataset, { "param": value, ... }) -> (ds_out: dataset) ``` ## Inputs & Outputs The following are the inputs expected by the step and the outputs it produces. These are generally columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced by name e.g. `"churn-clf"`). A dataset where each row is a tweet. Result of the aggregation, where each row is a twitter account. It will include for each author up to the following columns, depending on information present on the original dataset: * `author_id`: Official Twitter ID * `tweet_count`: Number of tweets by this author * `handler`: Official Twitter handle * `name`: User name * `pic`: Link to user's profile picture * `links`: A list of links mentioned by the user * `dates`: A list of dates of published tweets by this author * `tweet_ids`: The official Twitter IDs of the tweets published by the author * `retweets`: The number of retweets received * `favorites`: The number of favorites received * `mention_ids`: List of other accounts (IDs) the author has *mentioned* * `mention_names`: List of other accounts (names) the author has *mentioned* * `rp_user_ids`: List of other accounts (IDs) the author has *replied* to * `rp_user_names`: List of other accounts (names) the author has *replied* to * `mentions`: The count of mentions received * `replies`: The count of replies received * `tweet_text`: The text of the author's tweets, concatenated. ## Configuration The following parameters can be used to configure the behaviour of the step by including them in a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`. Whether to add rows for accounts only "mentioned" in original tweets. If mentions or replies are recorded in the dataset (in columns `mention_ids`, `mention_names` and/or `rp_user_id`, `rp_user_name`) will add the corresponding accounts as rows in the result, even if they didn't have a tweet in the original dataset. Will add `mentions` and `replies` columns recording how many times the accounts were mentioned or replied to. Column Map. If the names of any of your dataset's columns don't correspond to those we expect to find in a tweet dataset (e.g. originating in Twitter's own API), you can provide a mapping of of the sort `{"your_column": "author_id"}`. The expected column names are `[author_id, author_handler, author_name, author_avatar, links, date, id, retweets, favorites, mention_ids, mention_names, rp_user_id, rp_user_name , text]`. Column name to map. Values must be one of the following: `author_id` `author_handler` `author_name` `author_avatar` `links` `date` `id` `retweets` `favorites` `mention_ids` `mention_names` `rp_user_id` `rp_user_name` `text`