Relationship-Based Clustering and Visualization for High-Dimensional Data Mining

Authors:
Alexander Strehl;Joydeep Ghosh
Affiliations:
-;-
Venue:
INFORMS Journal on Computing
Year:
2003

Citing 0
Cited 21

Cluster ensembles: a knowledge reuse framework for combining partitionings

Eighteenth national conference on Artificial intelligence
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Model-based Clustering with Soft Balancing

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A unified framework for model-based clustering

The Journal of Machine Learning Research
A web-based tutoring system with styles-matching strategy for spatial geometric transformation

Interacting with Computers
A spectral clustering approach to optimally combining numericalvectors with a modular network

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Bregman bubble clustering: A robust framework for mining dense clusters

ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering in the membership embedding space

International Journal of Knowledge Engineering and Soft Data Paradigms
A modified relationship based clustering framework for density based clustering and outlier filtering on high dimensional datasets

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Data mining for web personalization

The adaptive web
Designing a methodology to estimate complexity of protein structures

ECAL'07 Proceedings of the 9th European conference on Advances in artificial life
A clustering framework for unbalanced partitioning and outlier filtering on high dimensional datasets

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Browsing an image database utilizing the associations between images and features

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
An efficient preprocessing stage for the relationship-based clustering framework

Intelligent Data Analysis
DClusterE: A Framework for Evaluating and Understanding Document Clustering Using Visualization

ACM Transactions on Intelligent Systems and Technology (TIST)
Cohort-based kernel visualisation with scatter matrices

Pattern Recognition
Clustering high dimensional data

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Evaluation of clustering algorithms for word sense disambiguation

International Journal of Data Analysis Techniques and Strategies
Multi-level relationship outlier detection

International Journal of Business Intelligence and Data Mining
Variational Bayes co-clustering with auxiliary information

Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In several real-life data-mining applications, data reside invery high (1000 or more) dimensional space, where both clustering techniques developed for low-dimensional spaces ( k-means, BIRCH, CLARANS, CURE, DBScan, etc.) as well as visualization methods such as parallel coordinates or projective visualizations, are rendered ineffective. This paper proposes a relationship-based approach that alleviates both problems, side-stepping the "curse of-dimensionality" issue by working in a suitable similarity space instead of the original high-dimensional attribute space. This intermediary similarity space can be suitably tailored to satisfy business criteria such as requiring customer clusters to represent comparable amounts of revenue. We apply efficient and scalable graph-partitioning-based clustering techniques in this space. The output from the clustering algorithm is used to re-order the data points so that the resulting permuted similarity matrix can be readily visualized in two dimensions, with clusters showing up as bands. While two-dimensional visualization of a similarity matrix is by itself not novel, its combination with the order-sensitive partitioning of a graph that captures the relevant similarity measure between objects provides three powerful properties: (i) the high-dimensionality of the data does not affect further processing once the similarity space is formed; (ii) it leads to clusters of (approximately) equal importance, and (iii) related clusters show up adjacent to one another, further facilitating the visualization of results. The visualization is very helpful for assessing and improving clustering. For example, actionable recommendations for splitting or merging of clusters can be easily derived, and it also guides the user toward the right number of clusters. Results are presented on a real retail industry dataset of several thousand customers and products, as well as on clustering of web-document collections and of web-log sessions.