An auto-stopped hierarchical clustering algorithm integrating outlier detection algorithm

Authors:
Tian-yang Lv;Tai-xue Su;Zheng-xuan Wang;Wan-li Zuo
Affiliations:
College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China
Venue:
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Year:
2005

Citing 10
Cited 1

FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Report on the SIGKDD 2001 conference panel "New Research Directions in KDD"

ACM SIGKDD Explorations Newsletter
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Towards An Unsupervised Optimal Fuzzy Clustering Algorithm for Image Database Organization

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 3
Unsupervised Clustering Method with Optimal Estimation of the Number of Clusters: Application to Image Segmentation

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 1
A New Cluster Isolation Criterion Based on Dissimilarity Increments

IEEE Transactions on Pattern Analysis and Machine Intelligence

Mining based decision support multi-agent system for personalized e-healthcare service

KES-AMSTA'08 Proceedings of the 2nd KES International conference on Agent and multi-agent systems: technologies and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is a critical problem for the clustering analysis techniques to select the appropriate value of parameters. Meanwhile, the clustering algorithms lack the effective mechanism to detect outliers while treating outliers as “noise”. By regarding outliers as valuable information, the paper proposes a novel hierarchical clustering algorithm that integrates a new outlier-mining method. The algorithm stops clustering according to the dissimilarity reflected by the detected outliers and needs only one parameter, whose appropriate value can be decided in the outlier mining process. After discussing some related topics, the paper adopts 5 real-life datasets to evaluate the performance of the clustering algorithm in outlier mining and clustering and compare it with other algorithms.