cluster_lineages¶
- outbreak_clustering.cluster_lineages(prevalences, tree, lineage_key=None, n=10, alpha=0.15)¶
Cluster lineages via greedy group-splitting on the phylo tree starting from the root based on some heuristics.
- Parameters
prevalences -- A dict, pandas series or other map between lineage names and (un-normalized) prevalences.
tree -- A frozendict representing the root of the phylo tree object.
lineage_key -- An OrderedDict mapping names to tree nodes.
n -- The target number of clusters.
alpha -- Heuristic control in range (0, 1); higher values avoid more low-quality groups, but can prevent convergence on some data.
- Returns
A tuple (U,V) of sets of group root lineages. Groups in U contain all descendant lineages of their roots, while groups in V are exclusive of some more distal groups in U or V.