Feature extraction in protein sequences classification: a new stability measure

Authors:
Rabie Saidi;Sabeur Aridhi;Engelbert Mephu Nguifo;Mondher Maddouri
Affiliations:
LIMOS - UBP - Clermont University, BP, France;LIMOS - UBP - Clermont University, BP, France and LIPAH - FST - University of Tunis El Manar, Tunisia and FSJ - University of Jendouba, Tunisia;LIMOS - UBP - Clermont University, BP, France;LIPAH - FST - University of Tunis El Manar, Tunis, Tunisia
Venue:
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Year:
2012

Citing 17
Cited 1

Rapid identification of repeated patterns in strings, trees and arrays

STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
Encoding of primary structures of biological macromolecules within a data mining perspective

Journal of Computer Science and Technology - Special issue on bioinformatics
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)

Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)
Stability of feature selection algorithms: a study on high-dimensional spaces

Knowledge and Information Systems
Spectral feature selection for supervised and unsupervised learning

Proceedings of the 24th international conference on Machine learning
Biological data management: research, practice and opportunities

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A review of feature selection techniques in bioinformatics

Bioinformatics
Stable feature selection via dense feature groups

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Feature Selection Using Ensemble Feature Selection Techniques

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Non-monotonic feature selection

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A dependency-based search strategy for feature selection

Expert Systems with Applications: An International Journal
Improving stability of feature selection methods

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality

IEEE Transactions on Pattern Analysis and Machine Intelligence
ProClusEnsem: Predicting membrane protein types by fusing different modes of pseudo amino acid composition

Computers in Biology and Medicine

Computational phenotype prediction of ionizing-radiation-resistant bacteria with a multiple-instance learning model

Proceedings of the 12th International Workshop on Data Mining in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature extraction is an unavoidable task, especially in the critical step of preprocessing biological sequences. This step consists for example in transforming the biological sequences into vectors of motifs where each motif is a subsequence that can be seen as a property (or attribute) characterizing the sequence. Hence, we obtain an object-property table where objects are sequences and properties are motifs extracted from sequences. This output can be used to apply standard machine learning tools to perform data mining tasks such as classification. Several previous works have described feature extraction methods for bio-sequence classification, but none of them discussed the robustness of these methods when perturbing the input data. In this work, we introduce the notion of stability of the generated motifs in order to study the robustness of motif extraction methods. We express this robustness in terms of the ability of the method to reveal any change occurring in the input data and also its ability to target the interesting motifs. We use these criteria to evaluate and experimentally compare four existing extraction methods for biological sequences.