A Scalable Approach to Balanced, High-Dimensional Clustering of Market-Baskets

Authors:
Alexander Strehl;Joydeep Ghosh
Affiliations:
-;-
Venue:
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Year:
2000

Citing 7
Cited 4

Algorithms for clustering data

Algorithms for clustering data
A unified geometric approach to graph separators

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Scalable algorithms for mining large databases

KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Algorithms

Clustering Algorithms
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Cluster ensembles

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
The impact of semi-supervised clustering on text classification

Proceedings of the 17th Panhellenic Conference on Informatics
Consensus strategy for clustering using RC-images

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents Opossum, a novel similarity-based clustering approach based on constrained, weighted graph-partitioning. Opossum is particularly attuned to real-life market baskets, characterized by very high-dimensional, highly sparse customer-product matrices with positive ordinal attribute values and significant amount of outliers. Since it is built on top of Metis, a well-known and highly efficient graphpartitioning algorithm, it inherits the scalable and easily parallelizeable attributes of the latter algorithm. Results are presented on a real retail industry data-set of several thousand customers and products, with the help of Clusion, a cluster visualization tool.