PolyA-iEP: A data mining method for the effective prediction of polyadenylation sites

Authors:
George Tzanis;Ioannis Kavakiotis;Ioannis Vlahavas
Affiliations:
Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece;Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece;Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 10
Cited 0

Instance-Based Learning Algorithms

Machine Learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
CAEP: Classification by Aggregating Emerging Patterns

DS '99 Proceedings of the Second International Conference on Discovery Science
Logistic Model Trees

Machine Learning
Prediction of mRNA polyadenylation sites by support vector machine

Bioinformatics
Applications of artificial intelligence in bioinformatics: A review

Expert Systems with Applications: An International Journal
Speeding up logistic model tree induction

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	12.05

Visualization

Abstract

This paper presents a study on polyadenylation site prediction, which is a very important problem in bioinformatics and medicine, promising to give a lot of answers especially in cancer research. We describe a method, called PolyA-iEP, that we developed for predicting polyadenylation sites and we present a systematic study of the problem of recognizing mRNA 3' ends which contain a polyadenylation site using the proposed method. PolyA-iEP is a modular system consisting of two main components that both contribute substantially to the descriptive and predictive potential of the system. In specific, PolyA-iEP exploits the advantages of emerging patterns, namely high understandability and discriminating power and the strength of a distance-based scoring method that we propose. The extracted emerging patterns may span across many elements around the polyadenylation site and can provide novel and interesting biological insights. The outputs of these two components are finally combined by a classifier in a highly effective framework, which in our setup reaches 93.7% of sensitivity and 88.2% of specificity. PolyA-iEP can be parameterized and used for both descriptive and predictive analysis. We have experimented with Arabidopsis thaliana sequences for evaluating our method and we have drawn important conclusions.