datebin_and_agg

outbreak_tools.datebin_and_agg(df, weights=None, freq='7D', rolling=1, startdate=None, enddate=None, column='prevalence', norm=True, variance=False, log=False, trustna=1)

Gather and aggregate samples into signals.

Parameters
  • df -- A multi-indexed pandas dataframe; df.index[0] is assumed to be a date and df.index[1] a categorical.

  • weights -- A pandas series of sample weights. None is appropriate for clinical df[column] and get_ww_weights for wastewater.

  • freq -- Length of date bins as a string.

  • rolling -- How to smooth the data; an int will be treated as a number of bins to take the rolling mean over, and an array as a kernel.

  • startdate -- Start of date bin range as YYYY-MM-DD string.

  • enddate -- End of date bin range as YYYY-MM-DD string.

  • column -- Data column to aggregate.

  • norm -- Whether to normalize so that aggregated values across all categories in a date bin sum to 1.

  • variance -- Whether to return the rolling variances along with the aggregated values.

  • log -- Whether to do the aggregation in log space (geometric vs arithmetic mean).

  • trustna -- How much weight to place on the nan=0 assumption.

Returns

A pandas dataframe of aggregated values with rows corresponding to date bins and columns corresponding to categories.