Skip to content

DecodeGenetics/Stefanssonetal-Nature-Genetics-2024

Repository files navigation

Code used in the publication:

The correlation between CpG methylation and gene expression is driven by sequence variants

Authors: Olafur Andri Stefansson1*, Brynja Dogg Sigurpalsdottir1,2, Solvi Rognvaldsson1, Gisli Hreinn Halldorsson1,3, Kristinn Juliusson1, Gardar Sveinbjornsson1, Bjarni Gunnarsson1, Doruk Beyter1, Hakon Jonsson1, Sigurjon Axel Gudjonsson1, Thorunn Asta Olafsdottir1,4, Saedis Saevarsdottir1,4, Magnus Karl Magnusson1,4, Sigrun Helga Lund1,3, Vinicius Tragante1, Asmundur Oddsson1, Marteinn Thor Hardarson1,2, Hannes Petur Eggertsson1, Reynir L. Gudmundsson1, Sverrir Sverrisson1, Michael L. Frigge1, Florian Zink1, Hilma Holm1, Hreinn Stefansson1, Thorunn Rafnar1, Ingileif Jonsdottir1,4, Patrick Sulem1, Agnar Helgason1,5, Daniel F. Gudbjartsson1,3, Bjarni V. Halldorsson1,2, Unnur Thorsteinsdottir1,4, Kari Stefansson1,4*

Affiliations:

  1. deCODE genetics / Amgen Inc., Sturlugata 8, Reykjavik, Iceland
  2. School of Technology, Reykjavik University, Reykjavik, Iceland
  3. School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
  4. Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
  5. Department of Anthropology, University of Iceland, Reykjavik, Iceland

*Correspondence to: Olafur Andri Stefansson (olafurs@decode.is), Kari Stefansson (kstefans@decode.is)

Abstract: Gene promoter and enhancer sequences are bound by transcription factors (TFs) and depleted of methylated CpG sites. The absence of methylated CpGs in these sequences typically correlates with increased gene expression, indicating a regulatory role for methylation. We used nanopore sequencing to determine haplotype-specific methylation rates of 15.3 million CpG units in 7,179 whole blood genomes. We identified 189,178 methylation depleted sequences (MDSs) where three or more proximal CpGs were unmethylated on at least one haplotype. 77,789 MDSs (~41%) associated with 80,503 cis-acting sequence variants which we termed allele-specific methylation QTLs (ASM-QTLs). RNA sequencing of 896 samples from the same blood draws used to perform nanopore sequencing, showed that the ASM-QTL i.e., DNA sequence variability, drives most of the correlation found between gene expression and CpG methylation. ASM-QTLs were enriched 46.4-fold (95%CI:36.0,58.7) among sequence variants associating with hematological traits, demonstrating that ASM-QTLs are important functional units in the non-coding genome.