Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

Authors:
Roberto Avogadri;Giorgio Valentini
Affiliations:
DSI, Dipartimento di Scienze dell' Informazione, Universití degli Studi di Milano, Via Comelico 39, 20135 Milano, Italy;DSI, Dipartimento di Scienze dell' Informazione, Universití degli Studi di Milano, Via Comelico 39, 20135 Milano, Italy
Venue:
Artificial Intelligence in Medicine
Year:
2009

Citing 17
Cited 14

Bagging predictors

Machine Learning
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Database-friendly random projections

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

Journal of Computer and System Sciences - Special issu on PODS 2001
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Cluster ensemble and its applications in gene expression analysis

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Clustering Ensembles: Models of Consensus and Weak Partitions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

Bioinformatics
Clusterv: a tool for assessing the reliability of clusters discovered in DNA microarray data

Bioinformatics
Moderate diversity for better cluster ensembles

Information Fusion
Graph-based consensus clustering for class discovery from gene expression data

Bioinformatics
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses

Artificial Intelligence in Medicine
Ensembles based on random projections to improve the accuracy of clustering algorithms

WIRN'05 Proceedings of the 16th Italian conference on Neural Nets

Guest editorial: Computational intelligence and machine learning in bioinformatics

Artificial Intelligence in Medicine
Case-based reasoning as a decision support system for cancer diagnosis: A case study

International Journal of Hybrid Intelligent Systems - Data Mining and Hybrid Intelligent Systems
CBR System with Reinforce in the Revision Phase for the Classification of CLL Leukemia

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Unsupervised Stability-Based Ensembles to Discover Reliable Structures in Complex Bio-molecular Data

Computational Intelligence Methods for Bioinformatics and Biostatistics
Improved wavelet neural network for early diagnosis of cancer patients using microarray gene expression data

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Research of fast SOM clustering for text information

Expert Systems with Applications: An International Journal
MicroCBR: A case-based reasoning architecture for the classification of microarray data

Applied Soft Computing
CLICOM: Cliques for combining multiple clusterings

Expert Systems with Applications: An International Journal
A fuzzy intelligent approach to the classification problem in gene expression data analysis

Knowledge-Based Systems
A multi-agent system for web-based risk management in small and medium business

Expert Systems with Applications: An International Journal
SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
An efficient and scalable family of algorithms for combining clusterings

Engineering Applications of Artificial Intelligence
Review article: Computational intelligence techniques in bioinformatics

Computational Biology and Chemistry
HMM-based hybrid meta-clustering ensemble for temporal data

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Two major problems related the unsupervised analysis of gene expression data are represented by the accuracy and reliability of the discovered clusters, and by the biological fact that the boundaries between classes of patients or classes of functionally related genes are sometimes not clearly defined. The main goal of this work consists in the exploration of new strategies and in the development of new clustering methods to improve the accuracy and robustness of clustering results, taking into account the uncertainty underlying the assignment of examples to clusters in the context of gene expression data analysis. Methodology: We propose a fuzzy ensemble clustering approach both to improve the accuracy of clustering results and to take into account the inherent fuzziness of biological and bio-medical gene expression data. We applied random projections that obey the Johnson-Lindenstrauss lemma to obtain several instances of lower dimensional gene expression data from the original high-dimensional ones, approximately preserving the information and the metric structure of the original data. Then we adopt a double fuzzy approach to obtain a consensus ensemble clustering, by first applying a fuzzy k-means algorithm to the different instances of the projected low-dimensional data and then by using a fuzzy t-norm to combine the multiple clusterings. Several variants of the fuzzy ensemble clustering algorithms are proposed, according to different techniques to combine the base clusterings and to obtain the final consensus clustering. Results and conclusion: We applied our proposed fuzzy ensemble methods to the gene expression analysis of leukemia, lymphoma, adenocarcinoma and melanoma patients, and we compared the results with other state of the art ensemble methods. Results show that in some cases, taking into account the natural fuzziness of the data, we can improve the discovery of classes of patients defined at bio-molecular level. The reduction of the dimension of the data, achieved through random projections techniques, is well-suited to the characteristics of high-dimensional gene expression data, thus resulting in improved performance with respect to single fuzzy k-means and with respect to ensemble methods based on resampling techniques. Moreover, we show that the analysis of the accuracy and diversity of the base fuzzy clusterings can be useful to explain the advantages and the limitations of the proposed fuzzy ensemble approach.