Resamples a dataset of events or time series to the desired frequency.
tall
and a wide
format:
Examples
ds
of shopping events in “tall” format to a weekly frequency, calculating the number of events per customer per week, weekly total and average spend, and the percentage of purchases in the category “pet food”. The original value columns to be aggredated are “price” and “category”. It also requests the output to be in “wide” format, which is the most convenient in Graphext. Setting fill_gaps
to true
ensures that the resampled data contains rows for all weeks between a customer’s first and last event, even those with no events.ds.first_name
), datasets (ds
or ds[["first_name", "last_name"]]
) or models (referenced
by name e.g. "churn-clf"
).
Inputs
Outputs
aggregations
parameter. The timestamp and aggregated value
columns will have scalar values if the output was requested in “tall” format, or lists if in “wide” format.step(..., {"param": "value", ...}) -> (output)
.
Parameters
Array items
Examples
freq
parameter.
Also see the corresponding Pandas documentation
for more details on each frequency.Alias | Description | |
---|---|---|
B | Business day (weekday) | |
D | Calendar day (absolute) | |
W | Week, optionally anchored on a day of the week (W-SUN…) | |
ME | Calendar month end (last day of month) | |
SME | Semi-month end (15th and end of month) | |
BME | Last business day of month | |
MS | Calendar month start (first day mof month) | |
SMS | Semi-month start (1st and 15th) | |
BMS | First business day of month | |
QE | Calendar quarter end | |
BQE | Business quarter end | |
QS | Calendar Quarter start | |
BQS | Business quarter start | |
YE/A/Y | Calendar year end | |
BYE/BA/BY | Business year end | |
YS/AS/YS | Calendar year start | |
BYS/BAS/BYS | Business year start | |
h/H | Hour | |
bh/BH | Business hour | |
min/T | Minute | |
s/S | Second | |
ms/L | Millisecond | |
us/U | Microsecond | |
ns/N | Nanosecond | . |
B
D
W
M
ME
SM
SME
BM
BME
MS
SMS
BMS
Q
QE
BQ
BQE
QS
BQS
A
Y
YE
BA
BY
BYE
AS
BAS
BYS
h
H
bh
BH
T
min
S
s
L
ms
U
us
N
ns
MON
TUE
WED
THU
FRI
SAT
SUN
JAN
FEB
MAR
APR
MAY
JUN
JUL
AUG
SEP
OCT
NOV
DEC
tall
wide
NaN
/0 values. If set to false
, the resampled data will
only contain rows for which there are events in the original data. If set to true
, the resampled data will
contain rows for all periods in the resampled frequency, with NaN
/0 values for periods with no events.sum
aggregation of column A calculates a single total by adding up all the values in A belonging to each group.In contrast to the more generic aggregate
and group_by
steps, for time series resampling, only functions returning scalar values
are supported. Allowed options for the func
parameters are:n
, size
or count
: calculate number of rows in groupsum
: sum total of valuesmean
: take mean of valuesmax
: take max of valuesmin
: take min of valuesfirst
: take first item foundlast
: take last item foundn_unique
: count the number of unique valuesconcatenate
: convert all values to text and concatenate them into one long textcount_where
: number of rows in which the column matches a value, needs parameter value
with the value that you want to countpercent_where
: percentage of the column where the column matches a value, needs parameter value
with the value that you want to countcount_where
and percent_where
an additional value
parameter is required.Item properties
Item properties
"func"
parameter. If the aggregation function accepts further arguments,
like the "value"
parameter in case of count_where
and percent_where
, these need to be provided also.
For example:Properties
n
size
count
sum
mean
n_unique
count_where
percent_where
concatenate
max
min
first
last
list
Examples