Hybrid cluster ensemble framework based on the random combination of data transformation operators

Authors:
Zhiwen Yu;Hau-San Wong;Jane You;Guoxian Yu;Guoqiang Han
Affiliations:
School of Computer Science and Engineering, South China University of Technology, China and Department of Computing, Hong Kong Polytechnic University, Hong Kong;Department of Computer Science, City University of Hong Kong, Hong Kong;Department of Computing, Hong Kong Polytechnic University, Hong Kong;School of Computer Science and Engineering, South China University of Technology, China;School of Computer Science and Engineering, South China University of Technology, China
Venue:
Pattern Recognition
Year:
2012

Citing 25
Cited 1

Bagging predictors

Machine Learning
Self-organizing maps

Self-organizing maps
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Theoretical Study on Six Classifier Fusion Strategies

IEEE Transactions on Pattern Analysis and Machine Intelligence
Random Forests

Machine Learning
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Machine Learning
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Combining Multiple Weak Clusterings

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Analysis of Consensus Partition in Cluster Ensemble

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Combining Multiple Clusterings Using Evidence Accumulation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining partitions by probabilistic label aggregation

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Clustering Ensembles: Models of Consensus and Weak Partitions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Forest: A New Classifier Ensemble Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mosclust: a software library for discovering significant structures in bio-molecular data

Bioinformatics
Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters

IEEE Transactions on Pattern Analysis and Machine Intelligence
Graph-based consensus clustering for class discovery from gene expression data

Bioinformatics
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses

Artificial Intelligence in Medicine
Microarray gene cluster identification and annotation through cluster ensemble and EM-based informative textual summarization

IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
"Fuzzy" versus "nonfuzzy" in combining classifiers designed by Boosting

IEEE Transactions on Fuzzy Systems
Reducing and Filtering Point Clouds With Enhanced Vector Quantization

IEEE Transactions on Neural Networks
`Neural-gas' network for vector quantization and its application to time-series prediction

IEEE Transactions on Neural Networks

SOM2CE: double self-organizing map based cluster ensemble framework and its application in cancer gene expression profiles

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Given a dataset P represented by an nxm matrix (where n is the number of data points and m is the number of attributes), we study the effect of applying transformations to P and how this affects the performance of different ensemble algorithms. Specifically, a dataset P can be transformed into a new dataset P' by a set of transformation operators @F in the instance dimension, such as sub-sampling, super-sampling, noise injection, and so on, and a corresponding set of transformation operators @J in the attribute dimension. Based on these conventional transformation operators @F and @J, a general form @W of the transformation operator is proposed to represent different kinds of transformation operators. Then, two new data transformation operators, known respectively as probabilistic based data sampling operator and probabilistic based attribute sampling operator, are designed to generate new datasets in the ensemble. Next, three new random transformation operators are proposed, which include the random combination of transformation operators in the data dimension, in the attribute dimension, and in both dimensions respectively. Finally, a new cluster ensemble approach is proposed, which integrates the random combination of data transformation operators across different dimensions, a hybrid clustering technique, a confidence measure, and the normalized cut algorithm into the ensemble framework. The experiments show that (i) random combination of transformation operators across different dimensions outperforms most of the conventional data transformation operators for different kinds of datasets. (ii) The proposed cluster ensemble framework performs well on different datasets such as gene expression datasets and datasets in the UCI machine learning repository.