Hypergraph Models and Algorithms for Data-Pattern-Based Clustering

Authors:
Muhammet Mustafa Ozdal;Cevdet Aykanat
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, USA;Computer Engineering Department, Bilkent University, USA
Venue:
Data Mining and Knowledge Discovery
Year:
2004

Citing 29
Cited 10

Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Information Processing and Management: an International Journal
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Combinatorial algorithms for integrated circuit layout

Combinatorial algorithms for integrated circuit layout
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Multilevel hypergraph partitioning: application in VLSI domain

DAC '97 Proceedings of the 34th annual Design Automation Conference
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A comparative study of clustering methods

Future Generation Computer Systems - Special double issue on data mining
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication

IEEE Transactions on Parallel and Distributed Systems
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A robust and scalable clustering algorithm for mixed type attributes in large database environment

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Data Mining and Knowledge Discovery
Constraint-Based Rule Mining in Large, Dense Databases

Data Mining and Knowledge Discovery
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Clustering Large Datasets in Arbitrary Metric Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Graphs and Hypergraphs

Graphs and Hypergraphs

Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
New challanges in dynamic load balancing

Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis

ACM Transactions on Mathematical Software (TOMS)
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data

IEEE Transactions on Knowledge and Data Engineering
Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices

Journal of Parallel and Distributed Computing
Hypergraph partitioning for document clustering: a unified clique perspective

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Selective Replicated Declustering for Arbitrary Queries

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A link-based storage scheme for efficient aggregate query processing on clustered road networks

Information Systems
New challenges in dynamic load balancing

Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
Efficient successor retrieval operations for aggregate query processing on clustered road networks

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In traditional approaches for clustering market basket type data, relations among transactions are modeled according to the items occurring in these transactions. However, an individual item might induce different relations in different contexts. Since such contexts might be captured by interesting patterns in the overall data, we represent each transaction as a set of patterns through modifying the conventional pattern semantics. By clustering the patterns in the dataset, we infer a clustering of the transactions represented this way. For this, we propose a novel hypergraph model to represent the relations among the patterns. Instead of a local measure that depends only on common items among patterns, we propose a global measure that is based on the cooccurences of these patterns in the overall data. The success of existing hypergraph partitioning based algorithms in other domains depends on sparsity of the hypergraph and explicit objective metrics. For this, we propose a two-phase clustering approach for the above hypergraph, which is expected to be dense. In the first phase, the vertices of the hypergraph are merged in a multilevel algorithm to obtain large number of high quality clusters. Here, we propose new quality metrics for merging decisions in hypergraph clustering specifically for this domain. In order to enable the use of existing metrics in the second phase, we introduce a vertex-to-cluster affinity concept to devise a method for constructing a sparse hypergraph based on the obtained clustering. The experiments we have performed show the effectiveness of the proposed framework.