A Fast Algorithm to Cluster High Dimensional Basket Data

Authors:
Carlos Ordonez;Edward Omiecinski;Norberto Ezquerra
Affiliations:
-;-;-
Venue:
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Year:
2001

Citing 0
Cited 5

FREM: fast and robust EM clustering for large data sets

Proceedings of the eleventh international conference on Information and knowledge management
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Localized signature table: fast similarity search on transaction data

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Image-mapped data clustering: An efficient technique for clustering large data sets

Intelligent Data Analysis
Similarity search in transaction databases with a two-level bounding mechanism

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a data mining problem that has received significant attention by the database community. Data set size, dimensionality and sparsity have been identified as aspectsthat make clustering more difficult. This work introduces a fast algorithm to cluster large binary data sets where data points have high dimensionality and most o their coordinates are zero. This is the case with basket data transactions containing items, that can be represented as sparse binary vectors with very high dimensionality. An experimental section shows performance, advantages and limitations of the proposed approach.