Protein sequence pattern mining with constraints

Authors:
Pedro Gabriel Ferreira;Paulo J. Azevedo
Affiliations:
Department of Informatics, University of Minho, Braga, Portugal;Department of Informatics, University of Minho, Braga, Portugal
Venue:
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2005

Citing 5
Cited 2

Sequence mining in categorical domains: incorporating constraints

Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Protein sequence classification through relevant sequence mining and bayes classifiers

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Mining interesting user behavior patterns in mobile commerce environments

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Considering the characteristics of biological sequence databases, which typically have a small alphabet, a very long length and a relative small size (several hundreds of sequences), we propose a new sequence mining algorithm (gIL). gIL was developed for linear sequence pattern mining and results from the combination of some of the most efficient techniques used in sequence and itemset mining. The algorithm exhibits a high adaptability, yielding a smooth and direct introduction of various types of features into the mining process, namely the extraction of rigid and arbitrary gap patterns. Both breadth or a depth first traversal are possible. The experimental evaluation, in synthetic and real life protein databases, has shown that our algorithm has superior performance to state-of-the art algorithms. The use of constraints has also proved to be a very useful tool to specify user interesting patterns.