Rapid identification of repeated patterns in strings, trees and arrays
STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
Encoding of primary structures of biological macromolecules within a data mining perspective
Journal of Computer Science and Technology - Special issue on bioinformatics
Toward Integrating Feature Selection Algorithms for Classification and Clustering
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)
Stability of feature selection algorithms: a study on high-dimensional spaces
Knowledge and Information Systems
Spectral feature selection for supervised and unsupervised learning
Proceedings of the 24th international conference on Machine learning
Biological data management: research, practice and opportunities
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A review of feature selection techniques in bioinformatics
Bioinformatics
Stable feature selection via dense feature groups
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Feature Selection Using Ensemble Feature Selection Techniques
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Non-monotonic feature selection
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A dependency-based search strategy for feature selection
Expert Systems with Applications: An International Journal
Improving stability of feature selection methods
CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Computers in Biology and Medicine
Proceedings of the 12th International Workshop on Data Mining in Bioinformatics
Hi-index | 0.00 |
Feature extraction is an unavoidable task, especially in the critical step of preprocessing biological sequences. This step consists for example in transforming the biological sequences into vectors of motifs where each motif is a subsequence that can be seen as a property (or attribute) characterizing the sequence. Hence, we obtain an object-property table where objects are sequences and properties are motifs extracted from sequences. This output can be used to apply standard machine learning tools to perform data mining tasks such as classification. Several previous works have described feature extraction methods for bio-sequence classification, but none of them discussed the robustness of these methods when perturbing the input data. In this work, we introduce the notion of stability of the generated motifs in order to study the robustness of motif extraction methods. We express this robustness in terms of the ability of the method to reveal any change occurring in the input data and also its ability to target the interesting motifs. We use these criteria to evaluate and experimentally compare four existing extraction methods for biological sequences.