Transform

Step	Fast	Description
add_noise		Add noise to a column with numbers or lists of numbers
calculate		Evaluates a formula containing basic arithmetic over a dataset’s columns
cast	⚡	Interprets and changes a column’s data to another (semantic) type
concatenate	⚡	Concatenate columns as text or lists with optional separator as well as pre- and postfix
count_unique	⚡	Counts the number of unique elements in each list/array of the input column
derive_column	⚡	Derive a new column with a custom JS script
discretize_on_quantiles	⚡	Discretize column into bins based on quantiles
discretize_on_values	⚡	Discretize column by binning its values using explicitly specified cuts points
divide	⚡	Divide two or more numeric columns in given order
equal	⚡	Check the row-wise equality of all input columns
explode	⚡	Explode (extract) items from column(s) of lists into separate rows
extract_date_component	⚡	Extract a component such as day, week, weekday etc. from a date column
extract_emoji		Parse texts and extract their emoji
extract_entities		Parse texts and extract the entities mentioned (persons, organizations etc.)
extract_hashtags		Parse texts and extract any hashtags mentioned
extract_json_values	⚡	Extract values from JSON columns using JsonPath
extract_keywords		Parse and extract keywords from texts
extract_mentions		Parse texts and extract any mentions detected
extract_ngrams		Parse texts and extract their n-grams
extract_range	⚡	Create a copy of a column nullifying values outside a specified range
extract_regex	⚡	Extract parts of texts detected using regular expressions
extract_text_features		Parse and process texts to extract multiple features at once
extract_url_components		Extract components from an URL
is_missing	⚡	Check for missing values in a given column
label_bios		Categorize people into fields of occupation using their bios (biographies)
label_categories	⚡	Relabel categories based on the top terms in each category
label_encode		Encode categories with values between 0 and N-1, where N is the number of unique categories
label_holidays		Indicate if there are any holidays for given date, location pairs
label_political_subtopics		Categorize the political sub-topics of texts in Spanish
label_political_topics		Categorize the political topics of texts in Spanish
label_texts_containing		Categorize texts containing specific keywords with custom labels
label_texts_containing_from_query		Label texts given an elastic-like query string
length	⚡	Calculates the length of lists (number of elements) or texts/categories (number of characters)
make_constant	⚡	Creates a new constant column (with a single unique value) of the same length as the input column
math_func		Applies a mathematical function to the values of a (single) numeric column
merge_similar_semantics		Group categories with similar meanings
merge_similar_spellings		Group categories with similar spellings
multiply	⚡	Multiply two or more numeric columns
normalize	⚡	Normalizes a numerical column by subtracting the mean and dividing by its standard deviation
observed_duration	⚡	Calculate the duration between two dates and determine whether an event was observed before a specified observation da…
order_categories	⚡	(Re-)order the categories of a categorical column
pandas_func		Applies an arbitrary pandas supported function to the values of an input column
pct_change		Calculate percentage change between consecutive numbers in a numeric column
percentile_rank	⚡	Convert the values in a numeric or date column into their percentile rank
query	⚡	Generate a boolean column based on a query string, marking rows that match the condition
replace_missing	⚡	Replace missing values (NaNs) with either a specified constant value or the result of a given function
replace_regex	⚡	Replace parts of text detected with a regular expression
replace_values	⚡	Replace specified values in a column with new ones
scale	⚡	Scales the values of a numerical column to lie between a specified minimum and maximum
segment_rows	⚡	Create a segmentation using graphext’s advanced query syntax (similar to Elasticsearch)
slice	⚡	Extract a range/slice of elements from a column of texts or lists
split_string	⚡	Split a single column containing texts into two
subtract	⚡	Subtract two or more numeric columns
sum	⚡	Calculate the row-wise sum of numeric columns
time_interval	⚡	Calculates the duration of a time interval between two dates (datetimes/timestamps)
tokenize		Parse texts and separate them into lists of tokens (words, lemmas, etc.)
trim_frequencies		Remove values whose frequencies (counts) are above/below a given threshold
unique	⚡	Extracts the unique elements in each list/array
unpack_list		Unpack (extract) items from a column of lists into separate columns

Prepare

Report

Analyse