Skip to main content

3 posts tagged with "machine learning"

View All Tags

M-PreSS: a transparent, open-source approach to study screening in systematic reviews


Overview

Screening thousands of titles and abstracts is often the single biggest bottleneck in a systematic review workflow. In this new medRxiv pre-print, we describe M-PreSS: a model pre-training approach that aims to make screening faster without relying on closed, black-box systems.

The key idea is to start from an open biomedical language model (BlueBERT) and fine-tune it for screening using a Siamese neural network setup, so that the resulting model can generalise across different review topics rather than needing a brand-new model each time.

CanDrivR-CS: cancer-specific machine learning to separate recurrent from rare missense variants


Overview

Cancer genomes contain huge numbers of mutations, but only a subset are functionally important. One simple clue is recurrence: if the same missense variant shows up repeatedly across patients with the same cancer type, that can suggest positive selection for growth advantage. At the same time, rare variants can still matter (for example, if they emerge under treatment as resistance mechanisms).

In work led by Amy Francis, we introduce CanDrivR-CS, a framework that trains cancer-type-specific machine-learning models to distinguish recurrent from rare somatic missense variants. It’s a useful reminder that “one-size-fits-all” predictors can miss disease-context signals, and that relatively interpretable models can still surface mechanistic hypotheses.

DrivR-Base: a feature extraction toolkit for variant effect prediction


Understanding which genetic variants are likely to be functional (and which are probably benign) is a cornerstone of modern human genetics. Over the last decade, variant-effect predictors have become increasingly sophisticated — but behind every model sits the same practical headache: assembling a sensible set of features (annotations) for millions of variants from dozens of databases.

In a 2024 Bioinformatics paper led by Amy Francis, we introduce DrivR-Base, a reproducible, Dockerised toolkit that turns this feature-extraction step into something you can run and re-run with far less pain.