Generalizing and learning protein-DNA binding sequence representations by an evolutionary algorithm

  • Authors:
  • Ka-Chun Wong;Chengbin Peng;Man-Hon Wong;Kwong-Sak Leung

  • Affiliations:
  • The Chinese University of Hong Kong, Department of Computer Science and Engineering, Shatin, Hong Kong and King Abdullah University of Science and Technology, Mathematical and Computer Sciences an ...;King Abdullah University of Science and Technology, Mathematical and Computer Sciences and Engineering Division, Jeddah, Kingdom of Saudi Arabia;The Chinese University of Hong Kong, Department of Computer Science and Engineering, Shatin, Hong Kong;The Chinese University of Hong Kong, Department of Computer Science and Engineering, Shatin, Hong Kong

  • Venue:
  • Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special issue on advances in computational intelligence and bioinformatics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Protein-DNA bindings are essential activities. Understanding them forms the basis for further deciphering of biological and genetic systems. In particular, the protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play a central role in gene transcription. Comprehensive TF-TFBS binding sequence pairs have been found in a recent study. However, they are in one-to-one mappings which cannot fully reflect the many-to-many mappings within the bindings. An evolutionary algorithm is proposed to learn generalized representations (many-to-many mappings) from the TF-TFBS binding sequence pairs (one-to-one mappings). The generalized pairs are shown to be more meaningful than the original TF-TFBS binding sequence pairs. Some representative examples have been analyzed in this study. In particular, it shows that the TF-TFBS binding sequence pairs are not presumably in one-to-one mappings. They can also exhibit many-to-many mappings. The proposed method can help us extract such many-to-many information from the one-to-one TF-TFBS binding sequence pairs found in the previous study, providing further knowledge in understanding the bindings between TFs and TFBSs.