Lineage Prevalence Analysis ----------------------------------------- The ``outbreak_data`` package has contains multiple endpoints that can collect information on SARS-CoV-2 lineages. Pulling data from a combination of endpoints will allow you to conduct your own analysis on the progression of SARS-CoV-2. On this page, you'll find a few example workflows that demonstrate how to collect, manipulate, and visualize prevalence data in SARS-CoV-2 lineages. Here is how we would go about collecting data to find all the XBB lineages prevalent in India within a 1-year timeframe:: # Perform authentication if you haven't already from outbreak_data import authenticate_user authenticate_user.authenticate_new_user() # Import outbreak_data package from outbreak_data import outbreak_data as od # Get the prevalence of all circulating XBB lineages in India data = od.prevalence_by_location("IND", startswith = 'xbb') # multiply prevalence values by 100% for scale data['prevalence_rolling'] = data['prevalence_rolling'].apply(lambda x: x*100) # Search for data based on date range data = data.sort_values(by="date") data = data.loc[data["date"].between("2020-09-12", "2022-03-31")] ## Use the visual package of your choice to create an area graph using your data import altair as alt # Graph of results alt.Chart(data, title = "Lineage Prevalence in India").mark_area().encode( x='date:T', y=alt.Y('prevalence_rolling:Q'), color = 'lineage:N') .. code-block:: :caption: Output: date total_count lineage_count lineage prevalence \ 3014 2022-09-12 0 0 xbb.1.16 0.000000 3781 2022-09-12 0 0 xbb.2.3 0.000000 2593 2022-09-12 152 2 xbb.1 0.013158 3782 2022-09-13 0 0 xbb.2.3 0.000000 3015 2022-09-13 0 0 xbb.1.16 0.000000 ... ... ... ... ... ... 4086 2023-03-31 196 2 xbb.2.3.2 0.010204 3322 2023-03-31 196 29 xbb.1.16.1 0.147959 2793 2023-03-31 196 1 xbb.1 0.005102 3381 2023-03-31 196 7 xbb.1.16.2 0.035714 3981 2023-03-31 196 15 xbb.2.3 0.076531 prevalence_rolling 3014 0.000000 3781 0.000000 2593 0.003451 3782 0.000000 3015 0.000000 ... ... 4086 0.031184 3322 0.144578 2793 0.014174 3381 0.045358 3981 0.084337 [985 rows x 6 columns] .. image:: graphs/prev_visual.png .. note:: The `Vega-Altair `_ visualization package is used for demonstration purposes. However, any Python visual package can be used to create graphical representations of the data. **Finding the Most Prevalent Lineages** If we wanted to determine and plot the top four most prevalent lineages in India, we can make a few queries and use a few simple commands to create a table that shows us what these lineages are:: data=od.prevalence_by_location("IND") most_prev = data.groupby('lineage').apply(max) # Finds the lineages with the most hits most_prev = most_prev.mask(most_prev == '').dropna(how = 'any') # Drop any unknowns most_prev = most_prev.iloc[:4] print(most_prev) .. code-block:: :caption: Output date total_count lineage_count lineage prevalence \ lineage ba.2 2023-04-20 5668 1445 ba.2 0.822785 ba.2.10.1 2023-04-19 5668 93 ba.2.10.1 0.285714 bq.1.1 2023-03-27 402 7 bq.1.1 0.428571 ch.1.1 2023-02-13 119 4 ch.1.1 0.400000 prevalence_rolling lineage ba.2 0.677541 ba.2.10.1 0.095541 bq.1.1 0.156863 ch.1.1 0.066667 Next we'll collect the prevalence data on each of the four lineages:: # Retrieve the official data on the prevalences of these lineages using `daily_prev()` d1 = od.daily_prev('ba.2', "IND") d2 = od.daily_prev('ba.2.10.1', "IND") d3 = od.daily_prev('bq.1.1', "IND") d4 = od.daily_prev( 'ch.1.1', "IND") # Formatting for creating the graph d1['lineage'] = 'ba.2' d2['lineage'] = 'ba.2.10.1' d3['lineage'] = 'bq.1.1' d4['lineage'] = 'ch.1.1' # Group together data from each lineage data = pd.concat([d1, d2, d3, d4]) data = data.rename(columns = {'proportion': 'proportion (%)'}) #Pick a date range to analyze data = data.sort_values(by="date") data = data.loc[data["date"].between("2022-09-12", "2023-03-31")] # Increase prevalence by 100% data['proportion'] = data['proportion'].apply(lambda x: x*100) #Graph using preferred visual package import altair as alt alt.Chart(data, title = "Top 4 Most Prevalent Lineages in India").mark_area().encode( x='date:T', y=alt.Y('proportion (%):Q'), color = 'lineage:N') .. image:: graphs/top4.png