Welcome to the Python Outbreak.info package docs!
Here you can find information on the functions you will use to collect and analyze SARS-COV-2 data from the Outbreak.info API. Our package pulls data from the Outbreak.info API and is reflected on our Outbreak.info web interface
Installation
We recommend installing the package via pip using:
pip install python-outbreak-info
Alternatively, the package can be directly installed from source via pip:
pip install git+https://github.com/outbreak-info/python-outbreak-info.git
Getting Started
The Python Outbreak.info package contains key functions for accessing genomic and epidemiological data for SARS-CoV-2. Access to genomic data requires logging in using GISAID credentials to generate an API key, using the authenticate_new_user()
function. To perform authentication, you’ll need to first run
from outbreak_data import authenticate_user
authenticate_user.authenticate_new_user()
and then you should be able access all of the functionality of the package. Most of the rest of the tools are available within the outbreak_data
component of the package. For example:
from outbreak_data import outbreak_data
lin_list = ['B.1.1.7','B.1.351','B.1.617.2']
# request lineages occurring with minimum frequency of 0.05 (5%)
df = outbreak_data.lineage_mutations(lin_list,freq=0.05)
# filter mutations and sort by codon number
df = df[df['gene']=='S'].sort_values(by='codon_num')
For location-specific analyses, users will need to supply the appropriate location code corresponding to their location of interest. To do this, we provide an ID lookup tool kit via the outbreak_tools
part of the package. An example lookup should look like:
from outbreak_tools import outbreak_tools
location_list = outbreak_tools.id_lookup(['Illinois','South Africa','Chile'])
# which returns ['USA_US-IL', 'ZAF', 'CHL']
Note
This project is under active development.
Core Outbreak Data Tools
- authenticate_new_user()
- cases_by_location(location)
- lineage_mutations(pango_lin, mutation, freq)
- all_lineage_prevalences(location, startswith)
- global_prevalence(pango_lin, mutations, cumulative, lineage_crumbs=False)
- sequence_counts(location, cumulative, sub_admin)
- mutations_by_lineage(mutation, location, pango_lin)
- prevalence_by_location(pango_lin, location, mutations=None, datemin=None, datemax=None, cumulative=None, lineage_crumbs=False)
- lineage_by_sub_admin(pango_lin, mutations, location, ndays=0, detected)
- collection_date(pango_lin, mutations=None, location=None)
- submission_date(pango_lin, mutations=None, location)
- mutation_details(mutations)
- daily_lag(location=None)
- wildcard_lineage(name)
- wildcard_location(name)
- wildcard_mutations(name)
- location_details(location)
- get_outbreak_data()
- growth_rates(lineage, location)