Selection of Statistical Features Based on Mutual Information for Classification of Human Coding and Non-coding DNA Sequences

  • Authors:
  • Alan Wee-Chung Liew;Yonghui Wu;Hong Yan

  • Affiliations:
  • City University of Hong Kong;City University of Hong Kong;City University of Hong Kong/ University of Sydney, Australia

  • Venue:
  • ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The classification of human gene sequences into exons and introns is an important but difficult problem. We study the discriminative power of various statistical features (22 in total) in term of their mutual information (MI). By performing correlation analysis, we are able to identify a set of features that has high MI value while at the same time is complementary in their information content. Using the set of features, which consists of the three SZ features, the AMI feature, and the first stop codon feature, we are able to achieve classification accuracy as high as 92%.