Selecting predictive features for recognition of hypersensitive sites of regulatory genomic sequences with an evolutionary algorithm

Authors:
Uday Kamath;Kenneth A. De Jong;Amarda Shehu
Affiliations:
George Mason University, Fairfax, VA, USA;George Mason University, Fairfax, VA, USA;George Mason University, Fairfax, VA, USA
Venue:
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Year:
2010

Citing 8
Cited 2

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks
Predicting the in vivo signature of human gene regulatory sequences

Bioinformatics
Working Set Selection Using Second Order Information for Training Support Vector Machines

The Journal of Machine Learning Research
Clustal W and Clustal X version 2.0

Bioinformatics
BioJava

Bioinformatics
Determining optimal decision model for support vector machine by genetic algorithm

CIS'04 Proceedings of the First international conference on Computational and Information Science

An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and Its Application to DNA Splice Site Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Binary Response Models for Recognition of Antimicrobial Peptides

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a method to improve the recognition of regulatory genomic sequences. Annotating sequences that regulate gene transcription is an emerging challenge in genomics research. Identifying regulatory sequences promises to reveal underlying reasons for phenotypic differences among cells and for diseases associated with pathologies in protein expression. Computational approaches have been limited by the scarcity of experimentally-known features specific to regulatory sequences. High-throughput experimental technology is finally revealing a wealth of hypersensitive (HS) sequences that are reliable markers of regulatory sequences and currently the focus of classification methods. The contribution of this paper is a novel method that combines evolutionary computation and SVM classification to improve the recognition of HS sequences. Based on experimental evidence that HS regions employ sequence features to interact with enzymes, the method seeks motifs to discriminate between HS and non-HS sequences. An evolutionary algorithm (EA) searches the space of sequences of different lengths to obtain such motifs. Experiments reveal that these motifs improve recognition of HS sequences by more than 10% compared to state-of-the-art classification methods. Analysis of these motifs reveals interesting insight into features employed by regulatory sequences to interact with DNA-binding enzymes.