Research overview
The development of software tools and data platforms is central to the work we do.
OpenGWAS
In collaboration with Gibran Hemani (IEU Mendelian randomization programme), Philip Haycock (CRUK Integrative Cancer Epidemiology Programme), Ben Elsworth, Matt Lyon, Tom Palmer and others, we developed the OpenGWAS platform. This comprises one of the largest open access databases of genome-wide association studies (GWAS) in the world (ElasticSearch running on Oracle cloud), and is integrated with a suite of R and Python tools to enable post-GWAS analysis (including for Mendelian randomization).
EpiGraphDB
A core output of the group, the EpiGraphDB platform is a knowledge graph comprising a graph database (Neo4J), a web interface and API to query the graph, and an R package. EpiGraphDB includes data from several of our drug-target prioritization projects, is the datasource for the ASQ application, and has been used to systematically identify disease risk factors.
Other tools
We have a number of other software tools and data platforms (some listed below), and the MRCIEU software page provides a more extensive list of MRC IEU software tools.
We aim to make all software open source and data resources open access to maximize the impact of our research.
EpiGraphDB: A graph database drawing together a wide array of data types relevant to population health. Access is provided through an API, web application and R package. | |
ASQ: A proof-of-principle natural language interface to EpiGraphDB that supports the annotation of “claims” in a piece of text (e.g. preprint abstract) with knowledge from the database. | |
OpenGWAS: The IEU GWAS database provides an extensive set of openly accessible genome-wide association study datasets. The IEU OpenGWAS database API provides fast programmatic access to the data. The ieugwaspy package provides a Python interface to the API. A suite of other packages have been developed for this resource within the MRC IEU | |
MR-Base: An analytical platform for two-sample Mendelian randomization that utilizes the IEU OpenGWAS database | |
MendelVar: A web-based platform to support gene prioritization using data from Mendelian disease genes, variants identified in clinical genetics and data from disease ontologies | |
LD Hub [RETIRED]: A database of GWAS summary statistics and an analytical platform for LD score regression | |
MELODI [RETIRED]: A literature mining platform to identify potential mechanistic pathways between an exposure and outcome | |
MELODI-Presto: A rapid API to support programmatic exploration of mechanistic pathways between an exposure and outcome | |
FATHMM: A suite of genetic variant effect prediction algorithms | |
CScape: A suite of genetic variant effect prediction algorithms for somatic mutations in cancer |