MRC IEU: Data Mining Epidemiological Relationships

Neo4J data integration pipeline

Background We’ve been using Neo4j for around five years in a variety of projects, sometimes as the main database MELODI and sometimes as part of a larger platform (OpenGWAS). We find creating queries with Cypher intuitive and query performance to be good. However, the integration of data into a graph is still a challenge, especially when using many data from a variety of sources. Our latest project EpiGraphDB uses data from over 20 independent sources, most of which require cleaning and QC before they can be incorporated.

Exploring Elasticsearch architectures with Oracle Cloud

The IEU GWAS Database The MRC Integrative Epidemiology Unit (MRC IEU) at the University of Bristol hosts the IEU GWAS Database, one of the world’s largest open collections of Genome Wide Associate Study data. As of April 2019, the database contains over 250 billion genetic association records from more than 20,000 analysis of human traits. The IEU GWAS Database underpins the IEUs flagship MR-Base analytical platform ( which is used by people all over the world to carry out analyses that identify causal relationships between risk factors and diseases, and prioritize potential drug targets.

Indexing 200 billion records in 2 days

Previously we had successfully run GWAS on almost all of the UKBiobank traits ( Our next job was to make these searchable at scale. This post explains how we have done this and how you can access the data. Background MR-Base ( is a platform for performing 2-sample Mendelian Randomization to infer causal relationships between phenotypes. In its beginnings GWAS were manually curated and loaded into a database for use with the platform.