MR-KG: A Knowledge Graph of Mendelian Randomization Evidence Powered by Large Language Models
📌 Background
Mendelian randomization (MR) is a powerful causal inference method that uses genetic variants as natural experiments to assess causal relationships between putative risk factors and disease outcomes. MR studies are increasingly abundant, but synthesising evidence across them remains challenging due to heterogeneity in reporting, traits examined, and the structure of the published literature.
To address this, Liu, Burton, Gatua, Hemani & Gaunt (2025) introduce MR-KG — a knowledge graph of MR evidence automatically extracted from published studies using large language models (LLMs).
🧠 What Is MR-KG?

MR-KG is a structured, machine-readable network of results from Mendelian randomization studies. Instead of manually curating every causal estimate, this project uses state-of-the-art large language models to:
- Extract structured data (e.g., exposures, outcomes, effect estimates) from scientific text
- Link entities such as traits, genetic instruments, and study metadata
- Standardise the relationships so they can be interrogated at scale
This makes MR evidence navigable by computational systems for downstream analysis, search, and reasoning.
🛠️ How It Works
The MR-KG pipeline has three major components:
- LLM-based Extraction — abstracts of published MR studies are processed through large language models to pull out structured triples (e.g., [trait → outcome → causal effect]).
- Graph Construction and Storage — extracted results are normalised into a consistent schema and stored as a graph where nodes represent entities (traits, studies, variants) and edges represent relationships (e.g., causal evidence).
- Interactive Access — a live web interface and API (e.g., via https://epigraphdb.org/mr-kg) allow users and programs to query and explore the graph.
The repository also integrates supplementary tools for quality control and similarity analyses between studies or traits.
🔍 Key Features
- Automated MR evidence extraction reduces the need for manual curation.
- Knowledge graph format represents complex relationships and enables sophisticated queries.
- LLM-powered processing allows extraction from a wide range of publication styles and formats.
- API & web frontend for interactive use by researchers and software clients.
💡 Why It Matters
MR-KG addresses a key bottleneck in genetic epidemiology: while MR generates important causal insights, the pace and volume of publication make it hard to synthesise evidence consistently.
A structured knowledge graph enables researchers to:
- Rapidly identify all MR evidence linking specific exposures and outcomes.
- Detect overlapping or conflicting causal findings.
- Integrate MR evidence with other biomedical resources for multidimensional analysis.
This could accelerate causally informed hypothesis generation and help triangulate evidence across studies.
🚀 Getting Started
The MR-KG project’s web interface and API are publicly available at https://epigraphdb.org/mr-kg/, allowing other teams to:
- Explore the current graph and extracted evidence
- Build analytical tools on top of the graph
- Contribute improvements to extraction models or schema
🧪 A Note on Preprints
This work is currently a preprint, meaning it has not yet been peer-reviewed. Preprints should be interpreted as early reports of research findings, and while valuable for rapid dissemination, they are preliminary.
Links
- Preprint: https://doi.org/10.64898/2025.12.14.25342218
- Project code repositories:
- Data extraction pipeline https://github.com/MRCIEU/llm-data-extraction
- Knowledge graph construction and interface https://github.com/MRCIEU/mr-kg