A clustering framework for unbalanced partitioning and outlier filtering on high dimensional datasets

Authors:
Turgay Tugay Bilgin;A. Yilmaz Camurcu
Affiliations:
Department of Computer Engineering, Maltepe University, Maltepe, Istanbul, Turkey;Department of Electronics and Computer Education, Marmara University, Kadikoy, Istanbul, Turkey
Venue:
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Year:
2007

Citing 9
Cited 1

Algorithms for clustering data

Algorithms for clustering data
A parallel algorithm for multilevel graph partitioning and sparse matrix ordering

Journal of Parallel and Distributed Computing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Clustering Algorithms

Clustering Algorithms
Visualization Techniques for Mining Large Databases: A Comparison

IEEE Transactions on Knowledge and Data Engineering
Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Relationship-Based Clustering and Visualization for High-Dimensional Data Mining

INFORMS Journal on Computing
Kernel k-means: spectral clustering and normalized cuts

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A fast kernel-based multilevel algorithm for graph clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

An efficient preprocessing stage for the relationship-based clustering framework

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study, we propose a better relationship based clustering framework for dealing with unbalanced clustering and outlier filtering on high dimensional datasets. Original relationship based clustering framework is based on a weighted graph partitioning system named METIS. However, it has two major drawbacks: no outlier filtering and forcing clusters to be balanced. Our proposed framework uses Graclus, an unbalanced kernel k-means based partitioning system. We have two major improvements over the original framework: First, we introduce a new space. It consists of tiny unbalanced partitions created using Graclus, hence we call it micro-partition space. We use a filtering approach to drop out singletons or micro-partitions that have fewer members than a threshold value. Second, we agglomerate the filtered micro-partition space and apply Graclus again for clustering. The visualization of the results has been carried out by CLUSION. Our experiments have shown that our proposed framework produces promising results on high dimensional datasets.