EpiGraphDB version 1.0
The EpiGraphDB platform has been updated with a new major release (version 1.0). This is the first release since version 0.3 in 2020 (what a year!) as well as since the publication of the journal article on Bioinformatics. We believe the underlying integration pipeline, data structure and architecture for the EpiGraphDB platform has now progressed sufficiently to a stable state that we are pleased to announce this major release a version 1.0!
In the following sections we highlight a few key new features and changes in this update. For more detailed and technical changes, please visit the changelog in the platform documentation.
New and overhauled data sources
ClinVar
ClinVar is a public archive of reports of genetic variants and interpretations of their clinical relevance to disease. The variants are submitted by clinical testing laboratories, research laboratories, expert panels and other groups.
We import ClinVar data (extracted on 2021-01-12) as gene-disease associations, available as [GENE_TO_DISEASE]
relationship in EpiGraphDB. The sources of information for the gene-disease relationship include OMIM, GeneReviews, and a limited amount of curation by NCBI staff.
Mapping between EBI GWAS Catalog GWAS traits and EFO terms
To complement existing semantic mapping between (Gwas)
traits and (Efo)
ontology terms (GWAS_NLP_EFO
) we have added the official mapping from EBI GWAS Catalog (available as “ebi-a” studies in OpenGWAS) and EFO terms. Such mapping is available as [GWAS_EFO_EBI]
in EpiGraphDB.
MR-EvE
We have incorporated the latest MR-EvE evidence to EpiGraphDB. The MR-EvE evidence is represented as [MR_EVE_MR]
in EpiGraphDB. With this update, [MR_EVE_MR]
evidence has increased from 583,619 records to 25,804,945 records (for further details visit the metadata and metrics ). For further examples regarding the MR-EvE evidence, take a look at the MR view on the EpiGraphDB WebUI and the confounder view as well as the underlying API endpoints in the EpiGraphDB API.
Reactome
The Reactome data source has been overhauled and simplified. We now make use of the protein and pathway data sets available to download here.
Literature derived evidence
In addition to the newer version of SemMedDB (semmedVER42_R) we used SemRep to create semantic triples from the MedRxiv and BioRxiv titles and abstracts. This resulted in renaming the literature nodes and relationships in the graph, e.g. instead of (SemmedTriple)
we now have (LiteratureTriple)
, and instead of (SemmedTerm)
we now have (LiteratureTerm)
, each with a _source
property (see changelog). Relationships between the new nodes are named after the data source, e.g. [SEMMEDDB_OBJ]
, [BIORXIV_OBJ]
and [MEDRXIV_OBJ]
in place of [SEM_OBJ]
.
Codebase
We have refactored our entire graph build pipeline to improve transparency, reliability and robustness. For this we use the neo4j-build-pipeline which uses defined schemas and tests to ensure the graph is consistent and clean. More details on this can be found in the following blog posts https://www.biocompute.org.uk/post/neo4j_data_integration/ and https://neo4j.com/blog/neo4j-data-integration-pipeline-using-snakemake-and-docker/.
In addition, the source code for the Graph, WebUI and API are now hosted on GitHub under the MRCIEU organisation. We plan to write a separate blog post regarding the technologies behind the EpiGraphDB platform in the near term future.
For further information on the software side of the EpiGraphDB project (as well as other software projects developed in the MRC IEU) please visit MRC IEU’s GitHub Pages.
EpiGraphDB can be accessed and interactive with via the following ways: - The interactive Web UI - The API - Example Jupyter notebooks - The R package
For further details on the EpiGraphDB research project please read our journal article published on Bioinformatics as well as the platform documentation.
Please get in touch with the team via GitHub issues, on twitter, or via emails.
EpiGraphDB team