Structural analysis of regulatory DNA sequences using grammar inference and Support Vector Machine

Authors:
Robertas Damaševičius
Affiliations:
Software Engineering Department, Kaunas University of Technology, Student 50-415, LT-51368, Kaunas, Lithuania
Venue:
Neurocomputing
Year:
2010

Citing 14
Cited 3

The algorithmic beauty of plants

The algorithmic beauty of plants
Lindenmayer systems, fractals and plants

Lindenmayer systems, fractals and plants
The computational linguistics of biological sequences

Artificial intelligence and molecular biology
Learning Local Languages and Their Application to DNA Sequence Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical

Advances in kernel methods
Lindenmayer and DNA: Watson-Crick D0L systems

Current trends in theoretical computer science
Watson-Crick ω-automata

Journal of Automata, Languages and Combinatorics
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Grammatical Inference in Bioinformatics

IEEE Transactions on Pattern Analysis and Machine Intelligence
A neural network based multi-classifier system for gene identification in DNA sequences

Neural Computing and Applications
A formal language-based approach in biology: Conference Reviews

Comparative and Functional Genomics
Alternative approaches for generating bodies of grammar rules

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Speeding up parsing of biological context-free grammars

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Machine learning techniques for predicting bacillus subtilis promoters

BSB'05 Proceedings of the 2005 Brazilian conference on Advances in Bioinformatics and Computational Biology

A new classification method for human gene splice site prediction

HIS'12 Proceedings of the First international conference on Health Information Science
GFO: A data driven approach for optimizing the Gaussian function based similarity metric in computational biology

Neurocomputing
A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS

Computers & Geosciences

Quantified Score

Hi-index	0.01

Visualization

Abstract

Regulatory DNA sequences such as promoters or splicing sites control gene expression and are important for successful gene prediction. Such sequences can be recognized by certain patterns or motifs that are conserved within a species. These patterns have many exceptions which makes the structural analysis of regulatory sequences a complex problem. Grammar rules can be used for describing the structure of regulatory sequences; however, the manual derivation of such rules is not trivial. In this paper, stochastic L-grammar rules are derived automatically from positive examples and counterexamples of regulatory sequences using genetic programming techniques. The fitness of grammar rules is evaluated using a Support Vector Machine (SVM) classifier. SVM is trained on known sequences to obtain a discriminating function which serves for evaluating a candidate grammar ruleset by determining the percentage of generated sequences that are classified correctly. The combination of SVM and grammar rule inference can mitigate the lack of structural insight in machine learning approaches such as SVM.