.. Py_Outbreak_API documentation master file, created by
sphinx-quickstart on Sat Oct 8 22:16:44 2022.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to the Python Outbreak.info package docs!
========================================================
Here you can find information on the functions you will use to collect and analyze SARS-COV-2 data from the Outbreak.info API.
Our package pulls data from the `Outbreak.info API `_ and is reflected on our `Outbreak.info web interface `_
Installation
----------------
We recommend installing the package via pip using:
``pip install python-outbreak-info``
Alternatively, the package can be directly installed from source via pip:
``pip install git+https://github.com/outbreak-info/python-outbreak-info.git``
Getting Started
----------------
The Python Outbreak.info package contains key functions for accessing genomic and epidemiological data for SARS-CoV-2. Access to genomic data requires logging in using GISAID credentials to generate an API key, using the ``authenticate_new_user()`` function. To perform authentication, you'll need to first run
.. code-block:: python
from outbreak_data.authenticate_user import authenticate_new_user
authenticate_new_user()
and then you should be able access all of the functionality of the package. Most of the rest of the tools are available within the ``outbreak_data`` component of the package. For example:
.. code-block:: python
from outbreak_data import outbreak_data
lin_list = ['B.1.1.7','B.1.351','B.1.617.2']
# request lineages occurring with minimum frequency of 0.05 (5%)
df = outbreak_data.known_mutations(lin_list,freq=0.05)
# filter mutations and sort by codon number
df = df[df['gene']=='S'].sort_values(by='codon_num')
For wastewater abundance analyses, users will need to supply the appropriate location code corresponding to their location of interest and a date range. To do this, users would first retrieve wastewater data from ``outbreak_data`` then aggregate the data by date and weight to get the abundances for each lineage using the ``outbreak_tools`` part of the package. An example lookup should look like:
.. code-block:: python
from outbreak_data import outbreak_data
from outbreak_tools import outbreak_tools
state = "Ohio"
startdate, enddate = "2023-09-01", "2023-10-01"
ww_samples = outbreak_data.get_wastewater_samples(region=state, date_range=[startdate, enddate])
ww_samples = outbreak_data.get_wastewater_lineages(ww_samples)
ww_abundances = outbreak_tools.datebin_and_agg(ww_samples, weights=outbreak_tools.get_ww_weights(ww_samples), startdate=startdate, enddate=enddate, freq='7D')
# This data frame is large, so we'll focus on one lineage
ww_abundances['B.1.1.191']
(2023-08-31, 2023-09-07] 0.068126
(2023-09-07, 2023-09-14] 0.017081
(2023-09-14, 2023-09-21] 0.030141
(2023-09-21, 2023-09-28] 0.031744
Name: EG.2, dtype: float64
**About Clinical and Wastewater Tools**
Toward the beginning of the SARS-Cov-2 pandemic, viral genome sequencing data were collected through specimens that were obtained from clinical testing. Yet studies have shown that the method has introduced sampling bias due to systemic healthcare disparities, particularly in poor and underserved communities. In contrast, wastewater samples have been highly useful for tracking regional infection dynamics while providing less biased abundance estimates than clinical testing. Data collected by tracking viral genomic sequences in wastewater has also improved community prevalence estimates and detects emerging variants earlier on.
In clinical genomic data, each sample contains one sequence, allowing us to see whether two or more mutations occur frequently together. However, wastewater data samples contain a mix of sequences, and it's unclear which mutations go with which variants exactly. Analyzing clinical data would then be needed to answer co-occurence questions.
The Andersen Lab has developed improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. The resulting data is now deployed by Python-outbreak-info. In short, SARS-Cov-2 analysis should be done using both clinical and wastewater tools in order to see the full picture in viral genomic data.
`Click here `_ for more information on wastewater analysis.
Table of Contents:
===================
User Authentication (*outbreak_data.authenticate_user*)
--------------------------------------------------------
.. toctree::
authenticate_user.authenticate_new_user
Core Outbreak Data Tools (*outbreak_data.outbreak_data*)
---------------------------------------------------------
.. toctree::
outbreak_data.all_lineage_prevalences
outbreak_data.cases_by_location
outbreak_data.daily_lag
outbreak_data.growth_rates
outbreak_data.gr_significance
outbreak_data.known_mutations
outbreak_data.lineage_by_sub_admin
outbreak_data.lineage_cl_prevalence
outbreak_data.most_recent_cl_data
outbreak_data.mutation_details
outbreak_data.mutation_prevalences
outbreak_data.sequence_counts
outbreak_data.wildcard_location
outbreak_data.wildcard_lineage
Wastewater Analysis Tools (*outbreak_data.outbreak_data*)
-----------------------------------------------------------
.. toctree::
outbreak_data.get_wastewater_latest
outbreak_data.get_wastewater_lineages
outbreak_data.get_wastewater_metadata
outbreak_data.get_wastewater_mutations
outbreak_data.get_wastewater_samples
outbreak_data.get_wastewater_samples_by_lineage
outbreak_data.get_wastewater_samples_by_mutation
Plotting and Organization Toolkit (*outbreak_tools.outbreak_tools*)
---------------------------------------------------------------------
.. toctree::
outbreak_tools.cluster_df
outbreak_tools.const_idx
outbreak_tools.datebin_and_agg
outbreak_tools.first_date
outbreak_tools.get_colors
outbreak_tools.get_riverplot_baseline
outbreak_tools.get_tree
outbreak_tools.get_ww_weights
Lineage Clustering (*outbreak_tools.outbreak_clustering*)
----------------------------------------------------------
.. toctree::
outbreak_clustering.cluster_lineages
outbreak_clustering.gather_groups
outbreak_clustering.get_agg_prevalence
outbreak_clustering.get_compressed_tree
outbreak_clustering.get_lineage_key
Example Applications and Analyses
----------------------------------
.. toctree::
Epidemiological data analyses
Mutation Data Analyses
Dealing with Cryptic Variants