mutations_by_lineage(mutation, location, pango_lin)
- outbreak_data.mutations_by_lineage(mutation=None, location=None, pango_lin=None, lineage_crumbs=False, datemin=None, datemax=None, freq=None, server='api.outbreak.info')
Returns the prevalence of a mutation or series of mutations across specified lineages by location
- Arguments:
- mutations:
(Optional). List or string of mutations separated by “,”.
- location_id:
(Optional). A string; If not specified, return most recent date globally.
- pangolin_lineage:
(Optional). If not specfied, returns all Pango lineages containing that mutation.
- lineage_crumbs:
If true returns data for descendant lineages of pango_lin. Include the wildcard symbol ‘*’ in string to return info on all related lineages.
- frequency:
(Optional) Minimimum frequency threshold for the prevalence of a mutation in a lineage.
- datemin:
(Optional). A string representing the first cutoff date for returned date. Must be in YYYY-MM-DD format and be before ‘datemax’
- datemax:
(Optional). A string representing the second cutoff date. Must be in YYY-MM-DD format and be after ‘datemin’
- return:
A pandas dataframe.
Example usage:
#Get info on mutation 'orf1b:p314l'
df = od.mutations_by_lineage('orf1b:p314l')
print(df)
pangolin_lineage lineage_count mutation_count proportion \
0 ba.2 1227503 1222717 0.996101
1 b.1.1.7 1154337 1147331 0.993931
2 ba.1.1 1044480 1039813 0.995532
3 ay.4 858839 854935 0.995454
4 ba.1 438947 437207 0.996036
... ... ... ... ...
2851 fn.1 1 1 1.000000
2852 miscba1ba2post5386 1 1 1.000000
2853 xbb.1.23 1 1 1.000000
2854 xbb.1.37 1 1 1.000000
2855 xbv 1 1 1.000000
proportion_ci_lower proportion_ci_upper
0 0.995990 0.996210
1 0.993788 0.994071
2 0.995402 0.995658
3 0.995310 0.995595
4 0.995847 0.996219
... ... ...
2851 0.146746 0.999614
2852 0.146746 0.999614
2853 0.146746 0.999614
2854 0.146746 0.999614
2855 0.146746 0.999614
[2856 rows x 6 columns]