A Hybrid Approach to Clustering in Very Large Databases

Authors:
Aoying Zhou;Weining Qian;Hailei Qian;Jin Wen;Shuigeng Zhou;Ye Fan
Affiliations:
-;-;-;-;-;-
Venue:
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Year:
2001

Citing 2
Cited 0

CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current clustering methods always have such problems: 1) High I/O cost and expensive maintenance; 2) Pre-specifying the uncertain parameter k; 3) Lacking good efficiency in treating arbitrary shape under very large data set environment. In this paper, we first present a hybrid-clustering algorithm to solve these problems. It combines both distance and density strategies, and makes full use of statistics information while keeping good cluster quality. The experimental results show that our algorithm outperforms other popular algorithms in terms of efficiency, cost, and even get much more speedup as the data size scales up.