MRC IEU: Data Mining Epidemiological Relationships


The “Data Mining Epidemiological Relationships” programme, led by Prof Tom Gaunt, is funded by the UK Medical Research Council as part of the MRC Integrative Epidemiology Unit at the University of Bristol. We are interested in understanding the mechanisms of disease, and approach this through the integration of diverse biomedical and epidemiological data and the development of software tools for analysis of these data. One of our key developments is EpiGraphDB, a database that integrates epidemiological and biomedical data to support mechanism discovery and aid causal inference.

We are always interested in hearing from potential collaborators or from talented graduate or postgraduate researchers who wish to pursue a career in bioinformatics, molecular epidemiology or data science.

Programme Overview

Population health research is being transformed by the increasing wealth of complex data. New high-dimensional epidemiological datasets provide novel opportunities for systematic approaches to understanding the relationships between risk factors and disease outcomes. This programme is building on our successes in collating data (eg the IEU GWAS database) and implementing software to automate causal inference using Mendelian randomization (MR-Base), and in literature mining (MELODI and MELODI-Presto). We will implement a new graph database (EpiGraphDB) that integrates causal estimates with comprehensive data on relationships between traits, risk factors, biomarkers, intervention targets and diseases. Using EpiGraphDB we will develop new methods to explore the relationships between risk factors and disease, enabling new causal hypotheses to be generated and explored.

Aims and Objectives

We aim to: (a) systematically integrate biological contextual information with causal estimates generated using Mendelian randomization (b) develop novel approaches to identifying, validating and prioritising potential causal estimates in the context of a wide array of other information (c) utilise our EpiGraphDB database to inform the development of new Mendelian randomization methods that address pleiotropy (d) apply the data and approaches from (a) to (c) to investigate the causal risk factors in cardiovascular disease and cancer.