> ## Documentation Index
> Fetch the complete documentation index at: https://docs.graphext.com/llms.txt
> Use this file to discover all available pages before exploring further.

# aggregate_tweets_by_author

> Group a dataset of tweets by author and calculate relevant author statistics. 

Works like the generic `aggregate` step, but with a predefined set of aggregation functions. See the `ds_out` argument below
for the columns generated in the resulting dataset.

## Usage

The following examples show how the step can be used in a recipe.

<Accordion title="Examples" icon="code" defaultOpen="true">
  <Tabs>
    <Tab title="Example 1">
      Aggregate tweets by author using standard Twitter column names

      ```stan theme={null}
      aggregate_tweets_by_author(ds) -> (ds_authors)
      ```
    </Tab>

    <Tab title="Example 2">
      Aggregate with custom column mapping

      ```stan theme={null}
      aggregate_tweets_by_author(ds, {"add_referenced_accounts": false, "column_map": {"user_id": "author_id", "screen_name": "author_handler"}}) -> (ds_authors)
      ```
    </Tab>

    <Tab title="Signature">
      General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

      ```stan theme={null}
      aggregate_tweets_by_author(ds_in: dataset, {
          "param": value,
          ...
      }) -> (ds_out: dataset)
      ```
    </Tab>
  </Tabs>
</Accordion>

## Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally
columns (`ds.first_name`), datasets (`ds` or `ds[["first_name", "last_name"]]`) or models (referenced
by name e.g. `"churn-clf"`).

<Accordion title="Inputs" icon="right-to-bracket">
  <ParamField path="ds_in" type="dataset" required>
    A dataset where each row is a tweet.
  </ParamField>
</Accordion>

<Accordion title="Outputs" icon="right-from-bracket">
  <ParamField path="ds_out" type="dataset" required>
    Result of the aggregation, where each row is a twitter account. It will include for each author up to the following columns,
    depending on information present on the original dataset:

    * `author_id`: Official Twitter ID
    * `tweet_count`: Number of tweets by this author
    * `handler`: Official Twitter handle
    * `name`: User name
    * `pic`: Link to user's profile picture
    * `links`: A list of links mentioned by the user
    * `dates`: A list of dates of published tweets by this author
    * `tweet_ids`: The official Twitter IDs of the tweets published by the author
    * `retweets`: The number of retweets received
    * `favorites`: The number of favorites received
    * `mention_ids`: List of other accounts (IDs) the author has *mentioned*
    * `mention_names`: List of other accounts (names) the author has *mentioned*
    * `rp_user_ids`: List of other accounts (IDs) the author has *replied* to
    * `rp_user_names`: List of other accounts (names) the author has *replied* to
    * `mentions`: The count of mentions received
    * `replies`: The count of replies received
    * `tweet_text`: The text of the author's tweets, concatenated.
  </ParamField>
</Accordion>

## Configuration

The following parameters can be used to configure the behaviour of the step by including them in
a json object as the last "input" to the step, i.e. `step(..., {"param": "value", ...}) -> (output)`.

<Accordion title="Parameters" defaultOpen="true" icon="sliders">
  <ParamField path="add_referenced_accounts" type="boolean" default="true">
    Whether to add rows for accounts only "mentioned" in original tweets.
    If mentions or replies are recorded in the dataset (in columns `mention_ids`, `mention_names`
    and/or `rp_user_id`, `rp_user_name`) will add the corresponding accounts as rows in the result,
    even if they didn't have a tweet in the original dataset.

    Will add `mentions` and `replies` columns recording how many times the accounts were
    mentioned or replied to.
  </ParamField>

  <ParamField path="column_map" type="object">
    Column Map.
    If the names of any of your dataset's columns don't correspond to those we expect
    to find in a tweet dataset (e.g. originating in Twitter's own API), you can provide
    a mapping of of the sort `{"your_column": "author_id"}`.

    The expected column names are `[author_id, author_handler, author_name, author_avatar,
        links, date, id, retweets, favorites, mention_ids, mention_names, rp_user_id, rp_user_name
        , text]`.

    <Accordion title="Pattern properties">
      <ParamField path="" type="string">
        Column name to map.

        Values must be one of the following:

        `author_id` `author_handler` `author_name` `author_avatar` `links` `date` `id` `retweets` `favorites` `mention_ids` `mention_names` `rp_user_id` `rp_user_name` `text`
      </ParamField>
    </Accordion>
  </ParamField>
</Accordion>
