lineage_cl_prevalence¶
- outbreak_data.lineage_cl_prevalence(pango_lin, descendants=False, location=None, mutations=None, datemin=None, datemax=None, cumulative=False, **req_args)¶
Get the daily prevalence of a set of lineages in clinical sequencing data.
- Parameters
pango_lin -- List of lineage names to query for.
descendants -- If True, return mutations contained in pango_lin as well as any descendants (works only with single pango_lin).
location -- A string containing the location ID to query within.
mutations -- A list of mutation names; query within the subset of sequences containing all of these.
datemin -- (Optional). String containing start of date range to query within in YYYY-MM-DD.
datemax -- (Optional). String containing end of date range to query within in YYYY-MM-DD.
cumulative -- If true returns the cumulative global prevalence since the first day of detection.
- Returns
A pandas dataframe containing prevalence data.
- Parameter example
{ 'pango_lin': 'BA.2.86.1', 'descendants': True }
Example Usage
Get the prevalence data for BA.2.86.1 in Canada:
>>> df = outbreak_data.lineage_cl_prevalence('BA.2.86.1', location = 'CAN')
>>> df
total_count lineage_count total_count_rolling \
date query
2023-09-08 BA.2.86.1 270 1 251.000000
2023-09-09 BA.2.86.1 204 0 260.571429
2023-09-10 BA.2.86.1 227 0 266.285714
2023-09-11 BA.2.86.1 390 0 290.285714
2023-09-12 BA.2.86.1 409 1 300.142857
... ... ... ...
2024-05-11 BA.2.86.1 20 0 37.285714
2024-05-12 BA.2.86.1 24 0 35.000000
2024-05-13 BA.2.86.1 69 0 36.285714
2024-05-14 BA.2.86.1 36 0 35.142857
2024-05-15 BA.2.86.1 3 0 31.428571
lineage_count_rolling proportion proportion_ci_lower \
date query
2023-09-08 BA.2.86.1 0.142857 0.000569 0.000002
2023-09-09 BA.2.86.1 0.142857 0.000548 0.000002
2023-09-10 BA.2.86.1 0.142857 0.000536 0.000002
2023-09-11 BA.2.86.1 0.142857 0.000492 0.000002
2023-09-12 BA.2.86.1 0.285714 0.000952 0.000002
... ... ... ...
2024-05-11 BA.2.86.1 0.000000 0.000000 0.000013
2024-05-12 BA.2.86.1 0.000000 0.000000 0.000014
2024-05-13 BA.2.86.1 0.000000 0.000000 0.000014
2024-05-14 BA.2.86.1 0.000000 0.000000 0.000014
2024-05-15 BA.2.86.1 0.000000 0.000000 0.000016
proportion_ci_upper
date query
2023-09-08 BA.2.86.1 0.009948
2023-09-09 BA.2.86.1 0.009569
2023-09-10 BA.2.86.1 0.009390
2023-09-11 BA.2.86.1 0.008617
2023-09-12 BA.2.86.1 0.008331
... ...
2024-05-11 BA.2.86.1 0.065207
2024-05-12 BA.2.86.1 0.068777
2024-05-13 BA.2.86.1 0.066944
2024-05-14 BA.2.86.1 0.068777
2024-05-15 BA.2.86.1 0.077230
[251 rows x 7 columns]
Get the prevalence data for BA.2 for the first week in Canada:
>>> df = outbreak_data.lineage_cl_prevalence('BA.2', location = 'CAN',
datemin = '2023-03-01',
datemax = '2023-03-08')
>>> df
total_count lineage_count total_count_rolling \
date query
2022-03-01 BA.2 569 77 569.000000
2022-03-02 BA.2 626 71 597.500000
2022-03-03 BA.2 572 78 589.000000
2022-03-04 BA.2 540 72 576.750000
2022-03-05 BA.2 413 70 544.000000
2022-03-06 BA.2 457 59 529.500000
2022-03-07 BA.2 549 75 532.285714
2022-03-08 BA.2 653 114 544.285714
lineage_count_rolling proportion proportion_ci_lower \
date query
2022-03-01 BA.2 77.000000 0.135325 0.109091
2022-03-02 BA.2 74.000000 0.123849 0.099185
2022-03-03 BA.2 75.333333 0.127900 0.102257
2022-03-04 BA.2 74.500000 0.129172 0.102845
2022-03-05 BA.2 73.600000 0.135294 0.109177
2022-03-06 BA.2 71.166667 0.134404 0.106981
2022-03-07 BA.2 71.714286 0.134729 0.108272
2022-03-08 BA.2 77.000000 0.141470 0.114181
proportion_ci_upper
date query
2022-03-01 BA.2 0.165257
2022-03-02 BA.2 0.151938
2022-03-03 BA.2 0.156063
2022-03-04 BA.2 0.157372
2022-03-05 BA.2 0.166742
2022-03-06 BA.2 0.164927
2022-03-07 BA.2 0.166358
2022-03-08 BA.2 0.172709