cluster_df¶
- outbreak_tools.cluster_df(df, clusters, tree, lineage_key=None, norm=True)¶
Aggregate the columns of a dataframe into some phylogenetic groups.
- Parameters
df -- A dataframe of prevalence signals. Rows are assumed to be date bins and columns are assumed to be lineages.
clusters -- A tuple (U,V) of sets of root nodes representing clusters (from cluster_lineages).
tree -- A frozendict representing the root of the phylo tree object.
lineage_key -- An OrderedDict mapping names to tree nodes.
norm -- Whether to assume that values in a row should sum to one.
- Returns
A tuple (data,names,is_inclusive) where data is the input dataframe with aggregated and relabeled columns, names contains the names of the root lineages for each column/group, and is_inclusive indicates whether the column's root is in U or V.