cluster_df

outbreak_tools.cluster_df(df, clusters, tree, lineage_key=None, norm=True)

Aggregate the columns of a dataframe into some phylogenetic groups.

Parameters
  • df -- A dataframe of prevalence signals. Rows are assumed to be date bins and columns are assumed to be lineages.

  • clusters -- A tuple (U,V) of sets of root nodes representing clusters (from cluster_lineages).

  • tree -- A frozendict representing the root of the phylo tree object.

  • lineage_key -- An OrderedDict mapping names to tree nodes.

  • norm -- Whether to assume that values in a row should sum to one.

Returns

A tuple (data,names,is_inclusive) where data is the input dataframe with aggregated and relabeled columns, names contains the names of the root lineages for each column/group, and is_inclusive indicates whether the column's root is in U or V.