Molecular biology for computer scientists
Artificial intelligence and molecular biology
Feature generation for sequence categorization
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Modeling protein families using probabilistic suffix trees
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Mining features for sequence classification
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
Sequential PAttern mining using a bitmap representation
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Biostatistical Analysis (5th Edition)
Biostatistical Analysis (5th Edition)
Protein sequence pattern mining with constraints
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Time-sensitive feature mining for temporal sequence classification
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Mining multimodal sequential patterns: a case study on affect detection
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
We tackle the problem of sequence classification using relevant subsequences found in a dataset of protein labelled sequences. A subsequence is relevant if it is frequent and has a minimal length. For each query sequence a vector of features is obtained. The features consist in the number and average length of the relevant subsequences shared with each of the protein families. Classification is performed by combining these features in a Bayes Classifier. The combination of these characteristics results in a multi-class and multi-domain method that is exempt of data transformation and background knowledge. We illustrate the performance of our method using three collections of protein datasets. The performed tests showed that the method has an equivalent performance to state of the art methods in protein classification.