Original Contribution: Stacked generalization
Neural Networks
Feature selection for ensembles
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Understanding the Crucial Role of AttributeInteraction in Data Mining
Artificial Intelligence Review
Ensemble Methods in Machine Learning
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Improving classification of microarray data using prototype-based feature selection
ACM SIGKDD Explorations Newsletter
Application of the GA/KNN method to SELDI proteomics data
Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A Blocking Strategy to Improve Gene Selection for Classification of Gene Expression Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A review of feature selection techniques in bioinformatics
Bioinformatics
A Clustering Based Hybrid System for Mass Spectrometry Data Analysis
PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
A heuristic biomarker selection approach based on professional tennis player ranking strategy
Computer Methods and Programs in Biomedicine
Hi-index | 0.01 |
Mass spectrometry (MS)-based proteomics has been established as a standard way for biomarker discovery and early detection of disease from the proteome level. However, the data generated by MS technology are noisy, have a lot of redundancy, and most importantly have low sample-to-dimension ratio. Therefore, successful analysis of MS data heavily relies on the algorithmic and data mining techniques used to extract information. In this paper, we briefly discuss some of the commonly used algorithms applied to mass-to-charge (m/z) feature selection, and propose a k-means clustering based hybrid algorithm to address the high-dimension and high-correlation issues associated with this task. Our k-means clustering based hybrid algorithm incorporates the advantages of both filter- and wrapper-based feature selection algorithms. A special iterative procedure is introduced to overcome the instability of k-means clustering and genetic algorithm. By comparing the proposed hybrid system with several popular m/z feature selection algorithms using 10 different classifiers, we show that the proposed algorithm is able to select m/z features with lower correlation while achieving higher sample classification accuracies. The m/z feature selection results also indicate that the proposed hybrid algorithm is very stable despite its stochastic nature.