IntroductionΒΆ
Identifying the locations of transcription factor (TF) binding sites across the genome is a common challenge in the fields of genomics and computational biology. Chromatin accessibility experiments such as DNaseI sequencing (DNase-seq) and Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) produce genome-wide data that include distinct “footprint” patterns at binding sites. A general class of computational methods called footprinters have been developed to model and identify these footprint patterns as a means to locate TF binding sites. For an overview of genomic footprinting please see [1] and [2].
DeFCoM (Detecting Footprints Containing Motifs) is an SVM-based supervised learning footprinter. The figure below provides a general overview of the main steps in the DeFCoM framework. Given a set of predicted motif sites annotated as active (TF-bound) or inactive (TF-unbound) for a given TF in a specific cell type or experimental condition, an SVM classifier is trained on features that are derived from DNase-seq or ATAC-seq data from the same cell type/condition for each motif site. The trained model is then used to predict active and inactive sites for another cell type/condition based only on DNase-seq or ATAC-seq data.
References
[1] Vierstra, J., & Stamatoyannopoulos, J. A. (2016). Genomic footprinting. Nature methods, 13(3), 213-221.
[2] Sung, M. H., Baek, S., & Hager, G. L. (2016). Genome-wide footprinting: ready for prime time?. Nature methods, 13(3), 222-228.