Artikel Open-Access
referiert
Veröffentlicht

On the value of intra-motif dependencies of human insulator protein CTCF

Zugehörigkeit
Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
Eggeling, Ralf;
Zugehörigkeit
Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
Gohr, Andre´;
GND
1013858662
Zugehörigkeit
Julius Kühn-Institut (JKI), Federal Research Centre of Cultivated Plants, Institute for Biosafety in Plant Biotechnology, Quedlinburg , Deutschland; Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Germany
Keilwagen, Jens;
Zugehörigkeit
Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Germany
Mohr, Michaela;
Zugehörigkeit
Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
Posch, Stefan;
Zugehörigkeit
Molecular and Computational Biology, University of Southern California, Los Angeles, United States of America
Smith, Andrew D.;
Zugehörigkeit
Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany; Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Germany; German Center of Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
Grosse, Ivo

The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3′ end.

Dateien

Zitieren

Zitierform:
Zitierform konnte nicht geladen werden.

Zugriffsstatistik

Gesamt:
Volltextzugriffe:
Metadatenansicht:
12 Monate:
Volltextzugriffe:
Metadatenansicht:

Rechte

Nutzung und Vervielfältigung: