aggregate_tweets_by_author

Works like the generic aggregate step, but with a predefined set of aggregation functions. See the ds_out argument below for the columns generated in the resulting dataset.

Usage

The following shows how the step can be used in a recipe.

Examples

Signature

General syntax for using the step in a recipe. Shows the inputs and outputs the step is expected to receive and will produce respectively. For futher details see sections below.

aggregate_tweets_by_author(ds_in: dataset, {
    "param": value,
    ...
}) -> (ds_out: dataset)

Inputs & Outputs

The following are the inputs expected by the step and the outputs it produces. These are generally columns (ds.first_name), datasets (ds or ds[["first_name", "last_name"]]) or models (referenced by name e.g. "churn-clf").

Inputs

ds_in

dataset

required

A dataset where each row is a tweet.

Outputs

ds_out

dataset

required

Result of the aggregation, where each row is a twitter account. It will include for each author up to the following columns, depending on information present on the original dataset:

author_id: Official Twitter ID
tweet_count: Number of tweets by this author
handler: Official Twitter handle
name: User name
pic: Link to user’s profile picture
links: A list of links mentioned by the user
dates: A list of dates of published tweets by this author
tweet_ids: The official Twitter IDs of the tweets published by the author
retweets: The number of retweets received
favorites: The number of favorites received
mention_ids: List of other accounts (IDs) the author has mentioned
mention_names: List of other accounts (names) the author has mentioned
rp_user_ids: List of other accounts (IDs) the author has replied to
rp_user_names: List of other accounts (names) the author has replied to
mentions: The count of mentions received
replies: The count of replies received
tweet_text: The text of the author’s tweets, concatenated.

Configuration

The following parameters can be used to configure the behaviour of the step by including them in a json object as the last “input” to the step, i.e. step(..., {"param": "value", ...}) -> (output).

Parameters

add_referenced_accounts

boolean

default:"true"

Whether to add rows for accounts only “mentioned” in original tweets. If mentions or replies are recorded in the dataset (in columns mention_ids, mention_names and/or rp_user_id, rp_user_name) will add the corresponding accounts as rows in the result, even if they didn’t have a tweet in the original dataset.Will add mentions and replies columns recording how many times the accounts were mentioned or replied to.

column_map

object

Column Map. If the names of any of your dataset’s columns don’t correspond to those we expect to find in a tweet dataset (e.g. originating in Twitter’s own API), you can provide a mapping of of the sort {"your_column": "author_id"}.The expected column names are

[author_id, author_handler, author_name, author_avatar,     links, date, id, retweets, favorites, mention_ids, mention_names, rp_user_id, rp_user_name     , text]

Pattern properties

Prepare

Report

Analyse

aggregate_tweets_by_author

Usage

Inputs & Outputs

Configuration

Prepare

Report

Analyse

​Usage

​Inputs & Outputs

​Configuration

Usage

Inputs & Outputs

Configuration