cluster_lineages

outbreak_clustering.cluster_lineages(prevalences, tree, lineage_key=None, n=10, alpha=0.15)

Cluster lineages via greedy group-splitting on the phylo tree starting from the root based on some heuristics.

Parameters
  • prevalences -- A dict, pandas series or other map between lineage names and (un-normalized) prevalences.

  • tree -- A frozendict representing the root of the phylo tree object.

  • lineage_key -- An OrderedDict mapping names to tree nodes.

  • n -- The target number of clusters.

  • alpha -- Heuristic control in range (0, 1); higher values avoid more low-quality groups, but can prevent convergence on some data.

Returns

A tuple (U,V) of sets of group root lineages. Groups in U contain all descendant lineages of their roots, while groups in V are exclusive of some more distal groups in U or V.