ARTS

Authors:
Sören Sonnenburg;Alexander Zien;Gunnar Rätsch
Affiliations:
-;-;-
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 12

Linear-Time Computation of Similarity Measures for Sequential Data

The Journal of Machine Learning Research
Human Pol II promoter prediction by using nucleotide property composition features

ISB '10 Proceedings of the International Symposium on Biocomputing
SCS: Signal, Context, and Structure Features for Genome-Wide Human Promoter Recognition

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The SHOGUN Machine Learning Toolbox

The Journal of Machine Learning Research
A unifying view of multiple kernel learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
A metastate HMM with application to gene structure identification in eukaryotes

EURASIP Journal on Advances in Signal Processing - Special issue on genomic signal processing
lp-Norm Multiple Kernel Learning

The Journal of Machine Learning Research
Efficient algorithms for similarity measures over sequential data: a look beyond kernels

DAGM'06 Proceedings of the 28th conference on Pattern Recognition
The poisson margin test for normalisation free significance analysis of NGS data

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Accurately predicting transcription start sites using logitlinear model and local oligonucleotide frequencies

ICIC'11 Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications
Similarity measures for sequential data

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Sally: a tool for embedding strings in vector spaces

The Journal of Machine Learning Research

Quantified Score

Hi-index	3.84

Visualization

Abstract

We develop new methods for finding transcription start sites (TSS) of RNA Polymerase II binding genes in genomic DNA sequences. Employing Support Vector Machines with advanced sequence kernels, we achieve drastically higher prediction accuracies than state-of-the-art methods. Motivation: One of the most important features of genomic DNA are the protein-coding genes. While it is of great value to identify those genes and the encoded proteins, it is also crucial to understand how their transcription is regulated. To this end one has to identify the corresponding promoters and the contained transcription factor binding sites. TSS finders can be used to locate potential promoters. They may also be used in combination with other signal and content detectors to resolve entire gene structures. Results: We have developed a novel kernel based method – called ARTS – that accurately recognizes transcription start sites in human. The application of otherwise too computationally expensive Support Vector Machines was made possible due to the use of efficient training and evaluation techniques using suffix tries. In a carefully designed experimental study, we compare our TSS finder to state-of-the-art methods from the literature: McPromoter, Eponine and FirstEF. For given false positive rates within a reasonable range, we consistently achieve considerably higher true positive rates. For instance, ARTS finds about 35% true positives at a false positive rate of 1/1000, where the other methods find about a half (18%). Availability: Datasets, model selection results, whole genome predictions, and additional experimental results are available at Contact: Gunnar.Raetsch@tuebingen.mpg.de