MRC IEU: Data Mining Epidemiological Relationships

The “Data Mining Epidemiological Relationships” programme, led by Prof Tom Gaunt, is funded by the UK Medical Research Council as part of the MRC Integrative Epidemiology Unit at the University of Bristol. We are interested in understanding the mechanisms of disease, and approach this through the integration of diverse biomedical and epidemiological data and the development of software tools for analysis of these data. One of our key developments is EpiGraphDB, a database that integrates epidemiological and biomedical data to support mechanism discovery and aid causal inference. Read more...

Recent posts

Neo4J data integration pipeline

We make extensive use of Neo4J for graph databases (including EpiGraphDB). One of the key challenges in constructing a heterogeneous graph database is the data integration from different sources. Ben Elsworth describes the pipeline he has developed to automate this process.

Exploring Elasticsearch architectures with Oracle Cloud

The IEU OpenGWAS database contains well over 100 billion rows of data on genetic associations. Ben Elsworth describes his work on implementing a cloud-based ElasticSearch database on the Oracle Cloud Infrastructure to can handle millions of queries per week.

Indexing 200 billion records in 2 days

A few years ago we started collecting genome-wide association study datasets and making them available to the research community. As the data grew from tens of millions to tens of billions of rows we found a MySQL database no longer sufficient. Ben Elsworth describes how he implemented an ElasticSearch solution to the challenge of querying a really large dataset.