Clustering high-dimensional data using an efficient and effective data space reduction

Authors:
Ratko Orlandic;Ying Lai;Wai Gen Yee
Affiliations:
Univ. of Illinois at Springfield, Springfield, IL;Illinois Institute of Technology, Chicago, IL;Illinois Institute of Technology, Chicago, IL
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 12
Cited 3

FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The design of a retrieval technique for high-dimensional data on tertiary storage

ACM SIGMOD Record
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Effective Management of Hierarchical Storage Using Two Levels of Data Clustering

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Clustering High Dimensional Massive Scientific Datasets

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management

Exploiting parallelism to support scalable hierarchical clustering

Journal of the American Society for Information Science and Technology
Special Section: Point-Based Graphics: Fast vector quantization for efficient rendering of compressed point-clouds

Computers and Graphics
High-dimensional similarity search using data-sensitive space partitioning

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a new algorithm for clustering data in high-dimensional feature spaces, called GARDENHD. The algorithm is organized around the notion of data space reduction, i.e. the process of detecting dense areas (dense cells) in the space. It performs effective and efficient elimination of empty areas that characterize typical high-dimensional spaces and an efficient adjacency-connected agglomeration of dense cells into larger clusters. It produces a compact representation that can effectively capture the essence of data. GARDENHD is a hybrid of cell-based and density-based clustering. However, unlike typical clustering methods in its class, it applies a recursive partition of sparse regions in the space using a new space-partitioning strategy. The properties of this partitioning strategy greatly facilitate data space reduction. The experiments on synthetic and real data sets reveal that GARDENHD and its data space reduction are effective, efficient, and scalable.