MRC IEU: Data Mining Epidemiological Relationships

Neo4J data integration pipeline

Background We’ve been using Neo4j for around five years in a variety of projects, sometimes as the main database MELODI and sometimes as part of a larger platform (OpenGWAS). We find creating queries with Cypher intuitive and query performance to be good. However, the integration of data into a graph is still a challenge, especially when using many data from a variety of sources. Our latest project EpiGraphDB uses data from over 20 independent sources, most of which require cleaning and QC before they can be incorporated.

Reducing drug development costs

A new animation

Overview This short animation explains how we use Mendelian randomization and colocalization to help prioritise drug targets. One of our aims in both programme 4 of the MRC IEU and the Integrative Cancer Epidemiology Programme is to integrate such prioritizations with other data to help inform drug development. Video  About the animation The animation is based on recent work by Dr Jie (Chris) Zheng, Vice-Chancellors Fellow based in programme 4 of the MRC IEU, who recently published an innovative Mendelian randomization and colocalization study of plasma protein levels in Nature Genetics, that demonstrated how genetic data can be used to support drug target prioritisation by identifying the causal effects of proteins on diseases.

Drug target prioritization using protein QTL

Overview An innovative genetic study of blood protein levels, led by researchers in the DMER programme at the MRC Integrative Epidemiology Unit (MRC-IEU) at the University of Bristol, has demonstrated how genetic data can be used to support drug target prioritisation by identifying the causal effects of proteins on diseases. Working in collaboration with pharmaceutical companies, Bristol researchers have developed a comprehensive analysis pipeline using genetic prediction of protein levels to prioritise drug targets, and have quantified the potential of this approach for reducing the failure rate of drug development.

Exploring Elasticsearch architectures with Oracle Cloud

The IEU GWAS Database The MRC Integrative Epidemiology Unit (MRC IEU) at the University of Bristol hosts the IEU GWAS Database, one of the world’s largest open collections of Genome Wide Associate Study data. As of April 2019, the database contains over 250 billion genetic association records from more than 20,000 analysis of human traits. The IEU GWAS Database underpins the IEUs flagship MR-Base analytical platform ( which is used by people all over the world to carry out analyses that identify causal relationships between risk factors and diseases, and prioritize potential drug targets.

Indexing 200 billion records in 2 days

Previously we had successfully run GWAS on almost all of the UKBiobank traits ( Our next job was to make these searchable at scale. This post explains how we have done this and how you can access the data. Background MR-Base ( is a platform for performing 2-sample Mendelian Randomization to infer causal relationships between phenotypes. In its beginnings GWAS were manually curated and loaded into a database for use with the platform.