StepFastDescription
add_noiseAdd noise to a column with numbers or lists of numbers
calculateEvaluates a formula containing basic arithmetic over a dataset’s columns
castInterprets and changes a column’s data to another (semantic) type
concatenateConcatenate columns as text or lists with optional separator as well as pre- and postfix
count_uniqueCounts the number of unique elements in each list/array of the input column
derive_columnDerive a new column with a custom JS script
discretize_on_quantilesDiscretize column into bins based on quantiles
discretize_on_valuesDiscretize column by binning its values using explicitly specified cuts points
divideDivide two or more numeric columns in given order
equalCheck the row-wise equality of all input columns
explodeExplode (extract) items from column(s) of lists into separate rows
extract_date_componentExtract a component such as day, week, weekday etc. from a date column
extract_emojiParse texts and extract their emoji
extract_entitiesParse texts and extract the entities mentioned (persons, organizations etc.)
extract_hashtagsParse texts and extract any hashtags mentioned
extract_json_valuesExtract values from JSON columns using JsonPath
extract_keywordsParse and extract keywords from texts
extract_mentionsParse texts and extract any mentions detected
extract_ngramsParse texts and extract their n-grams
extract_rangeCreate a copy of a column nullifying values outside a specified range
extract_regexExtract parts of texts detected using regular expressions
extract_text_featuresParse and process texts to extract multiple features at once
extract_url_componentsExtract components from an URL
is_missingCheck for missing values in a given column
label_biosCategorize people into fields of occupation using their bios (biographies)
label_categoriesRelabel categories based on the top terms in each category
label_encodeEncode categories with values between 0 and N-1, where N is the number of unique categories
label_holidaysIndicate if there are any holidays for given date, location pairs
label_political_subtopicsCategorize the political sub-topics of texts in Spanish
label_political_topicsCategorize the political topics of texts in Spanish
label_texts_containingCategorize texts containing specific keywords with custom labels
label_texts_containing_from_queryLabel texts given an elastic-like query string
lengthCalculates the length of lists (number of elements) or texts/categories (number of characters)
make_constantCreates a new constant column (with a single unique value) of the same length as the input column
math_funcApplies a mathematical function to the values of a (single) numeric column
merge_similar_semanticsGroup categories with similar meanings
merge_similar_spellingsGroup categories with similar spellings
multiplyMultiply two or more numeric columns
normalizeNormalizes a numerical column by subtracting the mean and dividing by its standard deviation
observed_durationCalculate the duration between two dates and determine whether an event was observed before a specified observation da…
order_categories(Re-)order the categories of a categorical column
pandas_funcApplies an arbitrary pandas supported function to the values of an input column
pct_changeCalculate percentage change between consecutive numbers in a numeric column
percentile_rankConvert the values in a numeric or date column into their percentile rank
queryGenerate a boolean column based on a query string, marking rows that match the condition
replace_missingReplace missing values (NaNs) with either a specified constant value or the result of a given function
replace_regexReplace parts of text detected with a regular expression
replace_valuesReplace specified values in a column with new ones
scaleScales the values of a numerical column to lie between a specified minimum and maximum
segment_rowsCreate a segmentation using graphext’s advanced query syntax (similar to Elasticsearch)
sliceExtract a range/slice of elements from a column of texts or lists
split_stringSplit a single column containing texts into two
subtractSubtract two or more numeric columns
sumCalculate the row-wise sum of numeric columns
time_intervalCalculates the duration of a time interval between two dates (datetimes/timestamps)
tokenizeParse texts and separate them into lists of tokens (words, lemmas, etc.)
trim_frequenciesRemove values whose frequencies (counts) are above/below a given threshold
uniqueExtracts the unique elements in each list/array
unpack_listUnpack (extract) items from a column of lists into separate columns