datebin_and_agg¶
- outbreak_tools.datebin_and_agg(df, weights=None, freq='7D', rolling=1, startdate=None, enddate=None, column='prevalence', norm=True, variance=False, log=False, trustna=1)¶
Gather and aggregate samples into signals.
- Parameters
df -- A multi-indexed pandas dataframe; df.index[0] is assumed to be a date and df.index[1] a categorical.
weights -- A pandas series of sample weights. None is appropriate for clinical df[column] and get_ww_weights for wastewater.
freq -- Length of date bins as a string.
rolling -- How to smooth the data; an int will be treated as a number of bins to take the rolling mean over, and an array as a kernel.
startdate -- Start of date bin range as YYYY-MM-DD string.
enddate -- End of date bin range as YYYY-MM-DD string.
column -- Data column to aggregate.
norm -- Whether to normalize so that aggregated values across all categories in a date bin sum to 1.
variance -- Whether to return the rolling variances along with the aggregated values.
log -- Whether to do the aggregation in log space (geometric vs arithmetic mean).
trustna -- How much weight to place on the nan=0 assumption.
- Returns
A pandas dataframe of aggregated values with rows corresponding to date bins and columns corresponding to categories.