Skip to main content

6 posts tagged with "NLP"

View All Tags

MR-KG: A Knowledge Graph of Mendelian Randomization Evidence Powered by Large Language Models


📌 Background

Mendelian randomization (MR) is a powerful causal inference method that uses genetic variants as natural experiments to assess causal relationships between putative risk factors and disease outcomes. MR studies are increasingly abundant, but synthesising evidence across them remains challenging due to heterogeneity in reporting, traits examined, and the structure of the published literature.

To address this, Liu, Burton, Gatua, Hemani & Gaunt (2025) introduce MR-KG — a knowledge graph of MR evidence automatically extracted from published studies using large language models (LLMs).

Liu et al. "MR-KG: A knowledge graph of Mendelian randomization evidence powered by large language models". 2025, medRxiv DOI:10.64898/2025.12.14.25342218

M-PreSS: a transparent, open-source approach to study screening in systematic reviews


Overview

Screening thousands of titles and abstracts is often the single biggest bottleneck in a systematic review workflow. In this new medRxiv pre-print, we describe M-PreSS: a model pre-training approach that aims to make screening faster without relying on closed, black-box systems.

The key idea is to start from an open biomedical language model (BlueBERT) and fine-tune it for screening using a Siamese neural network setup, so that the resulting model can generalise across different review topics rather than needing a brand-new model each time.

Pilot analysis on BioRxiv and MedRxiv full text data to facilitate comprehensive data mining on biomedical literature


Overview

The BioRxiv and MedRxiv preprint facilities are vital infrastructure for the biomedical research community, which also provide a rich and comprehensive resource for data mining biomedical literature for investigations on research trends, interests, and novel findings. In our previous works we have conducted extensive literature mining efforts on BioRxiv and MedRxiv to extract structural literature knowledge into EpiGraphDB[1] and derive research claims from recent preprints to be triangulated with other evidence on ASQ[2].

Triangulating evidence in health sciences with Annotated Semantic Queries


Update: The ASQ work has now been published in Bioinformatics.

Yi Liu, Tom R Gaunt, Triangulating evidence in health sciences with Annotated Semantic Queries, Bioinformatics, Volume 40, Issue 9, September 2024, btae519, https://doi.org/10.1093/bioinformatics/btae519

Overview

Integrating information from data sources representing different study designs has the potential to strengthen evidence in population health research. However, this concept of evidence “triangulation” presents a number of challenges for systematically identifying and integrating relevant information.

In this medRxiv preprint we present ASQ (Annotated Semantic Queries), a natural language query interface to the integrated biomedical entities and epidemiological evidence in EpiGraphDB . ASQ enables users to extract “claims” from a piece of unstructured text, and then investigate the evidence that could either support, contradict the claims, or offer additional information to the query.

The ASQ approach has the potential to support the rapid review of pre-prints, grant applications, conference abstracts and articles submitted for peer review. ASQ implements strategies to harmonize biomedical entities in different taxonomies and evidence from different sources, to facilitate evidence triangulation and interpretation.

ASQ is openly available at https://asq.epigraphdb.org.

Systematic comparison of Mendelian randomization studies and randomized controlled trials using electronic databases


Overview

Mendelian Randomization (MR) uses genetic instrumental variables to make causal inferences. Whilst sometimes referred to as “nature’s randomized trial”, it has distinct assumptions that make comparisons between the results of MR studies with those of actual randomized controlled trials (RCTs) invaluable.

Evaluating the potential benefits and pitfalls of combining protein and expression quantitative trait loci in evidencing drug targets


Overview

Molecular quantitative trait loci (molQTL), which can provide functional evidence on the mechanisms underlying phenotype-genotype associations, are increasingly used in drug target validation and safety assessment. In particular, protein abundance QTLs (pQTLs) and gene expression QTLs (eQTLs) are the most commonly used for this purpose. However, questions remain on how to best consolidate results from pQTLs and eQTLs for target validation.