Overlap-Based Similarity Metrics for Motif Search in DNA Sequences
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Hi-index | 0.00 |
Identification of transcription factor binding site in DNA sequences is a frequently performed task in bioinformatics. However, current methods of search produce a large number of false positives as these motifs are short and degenerate. We propose an implicit model of cooperative binding of transcription factors. We hypothesize that flanking regions of binding sites have a different composition compared to regions which do not have that binding site. Using statistically significant motifs in flanking region of true binding sites as features, we design a SVM classifier for discriminating true binding sites from false positives. We demonstrate the effectiveness of our method on a data set of experimentally verified p53 binding sites. We were able to obtain an overall accuracy of 80% and 76% on crossvalidation and independent test set, respectively. By analyzing the features, we identified known as well as potentially new binding partners of p53.