A feature generation algorithm for sequences with application to splice-site prediction

  • Authors:
  • Rezarta Islamaj;Lise Getoor;W. John Wilbur

  • Affiliations:
  • Computer Science Department, University of Maryland, College Park, MD;Computer Science Department, University of Maryland, College Park, MD;National Center for Biotechnology Information, NLM, NIH, Bethesda, MD

  • Venue:
  • PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a new approach to feature selection for sequence data. We identify general feature categories and give construction algorithms for each of them. We show how they can be integrated in a system that tightly couples feature construction and feature selection. This integrated process, which we refer to as feature generation, allows us to systematically search a large space of potential features. We demonstrate the effectiveness of our approach for an important component of the gene finding problem, splice-site prediction. We show that predictive models built using our feature generation algorithm achieve a significant improvement in accuracy over existing, state-of-the-art approaches.