Protein sequence classification through relevant sequence mining and bayes classifiers

Authors:
Pedro Gabriel Ferreira;Paulo J. Azevedo
Affiliations:
Department of Informatics, University of Minho, Braga, Portugal;Department of Informatics, University of Minho, Braga, Portugal
Venue:
EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Year:
2005

Citing 9
Cited 3

Molecular biology for computer scientists

Artificial intelligence and molecular biology
Feature generation for sequence categorization

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Modeling protein families using probabilistic suffix trees

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Mining features for sequence classification

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Biostatistical Analysis (5th Edition)

Biostatistical Analysis (5th Edition)
Protein sequence pattern mining with constraints

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Time-sensitive feature mining for temporal sequence classification

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Mining multimodal sequential patterns: a case study on affect detection

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Hidden markov model-based time series prediction using motifs for detecting inter-time-serial correlations

Proceedings of the 27th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We tackle the problem of sequence classification using relevant subsequences found in a dataset of protein labelled sequences. A subsequence is relevant if it is frequent and has a minimal length. For each query sequence a vector of features is obtained. The features consist in the number and average length of the relevant subsequences shared with each of the protein families. Classification is performed by combining these features in a Bayes Classifier. The combination of these characteristics results in a multi-class and multi-domain method that is exempt of data transformation and background knowledge. We illustrate the performance of our method using three collections of protein datasets. The performed tests showed that the method has an equivalent performance to state of the art methods in protein classification.