The algorithmic beauty of plants
The algorithmic beauty of plants
Lindenmayer systems, fractals and plants
Lindenmayer systems, fractals and plants
The computational linguistics of biological sequences
Artificial intelligence and molecular biology
Learning Local Languages and Their Application to DNA Sequence Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical
Advances in kernel methods
Lindenmayer and DNA: Watson-Crick D0L systems
Current trends in theoretical computer science
Journal of Automata, Languages and Combinatorics
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Grammatical Inference in Bioinformatics
IEEE Transactions on Pattern Analysis and Machine Intelligence
A neural network based multi-classifier system for gene identification in DNA sequences
Neural Computing and Applications
A formal language-based approach in biology: Conference Reviews
Comparative and Functional Genomics
Alternative approaches for generating bodies of grammar rules
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Speeding up parsing of biological context-free grammars
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Machine learning techniques for predicting bacillus subtilis promoters
BSB'05 Proceedings of the 2005 Brazilian conference on Advances in Bioinformatics and Computational Biology
A new classification method for human gene splice site prediction
HIS'12 Proceedings of the First international conference on Health Information Science
Hi-index | 0.01 |
Regulatory DNA sequences such as promoters or splicing sites control gene expression and are important for successful gene prediction. Such sequences can be recognized by certain patterns or motifs that are conserved within a species. These patterns have many exceptions which makes the structural analysis of regulatory sequences a complex problem. Grammar rules can be used for describing the structure of regulatory sequences; however, the manual derivation of such rules is not trivial. In this paper, stochastic L-grammar rules are derived automatically from positive examples and counterexamples of regulatory sequences using genetic programming techniques. The fitness of grammar rules is evaluated using a Support Vector Machine (SVM) classifier. SVM is trained on known sequences to obtain a discriminating function which serves for evaluating a candidate grammar ruleset by determining the percentage of generated sequences that are classified correctly. The combination of SVM and grammar rule inference can mitigate the lack of structural insight in machine learning approaches such as SVM.