A study of embedding methods under the evidence accumulation framework

Authors:
Helena Aidos;Ana Fred
Affiliations:
Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal;Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal
Venue:
SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Year:
2011

Citing 10
Cited 0

Data clustering: a review

ACM Computing Surveys (CSUR)
Finding Consistent Clusters in Data Partitions

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Combining Multiple Clusterings Using Evidence Accumulation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neighborhood Preserving Embedding

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
A Nonlinear Mapping for Data Structure Analysis

IEEE Transactions on Computers
Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction
Cluster-Based cumulative ensembles

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address a voting mechanism to combine clustering ensembles leading to the so-called co-association matrix, under the Evidence Accumulation Clustering framework. Different clustering techniques can be applied to this matrix to obtain the combined data partition, and different clustering strategies may yield too different combination results.We propose to apply embedding methods over this matrix, in an attempt to reduce the sensitivity of the final partition to the clustering method, and still obtain competitive and consistent results. We present a study of several embedding methods over this matrix, interpreting it in two ways: (i) as a feature space and (ii) as a similarity space. In the first case we reduce the dimensionality of the feature space; in the second case we obtain a representation constrained to the similarity matrix. When applying several clustering techniques over these new representations, we evaluate the impact of these transformations in terms of performance and coherence of the obtained data partition. Experimental results, on synthetic and real benchmark datasets, show that extracting the relevant features through dimensionality reduction yields more consistent results than applying the clustering algorithms directly to the co-association matrix.