A graph-based clustering algorithm in large transaction databases

Authors:
Ning Chen;An Chen;Longxiang Zhou;Liu Lu
Affiliations:
Mathematics Institute, Chinese Academy of Sciences, Beijing 100080, P.R. China;Economics and Management School, Beijing University of Aeronautics & Astronautic, Beijing 100083, P.R. China. E-mail: anchen1@yahoo.com;Economics and Management School, Beijing University of Aeronautics & Astronautic, Beijing 100083, P.R. China. E-mail: anchen1@yahoo.com;Economics and Management School, Beijing University of Aeronautics & Astronautic, Beijing 100083, P.R. China. E-mail: anchen1@yahoo.com
Venue:
Intelligent Data Analysis
Year:
2001

Citing 6
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Clustering Association Rules

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Focusing solutions for data mining: analytical studies and experimental results in real-world domains

Focusing solutions for data mining: analytical studies and experimental results in real-world domains

Survey: Graph clustering

Computer Science Review

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clustering in transaction databases can find potentially useful patterns to improve the product profit. Unfortunately, most clustering algorithms based on metric distances are not appropriate for transaction data. In this paper, we study the problem of item clustering in large transaction databases. We first present a definition of similarity measure between items based on large itemsets presented in transaction databases, which not only captures the co-occurrence relationship of items but also remains insensitive to noise. We represent the similarity relationship by an undirected graph and transform the clustering problem into discovering connected components of the graph. We also discuss the evaluation of clustering quality and develop an automatic optimizer for the optimum thresholds search, finding the item clustering which optimizes the quality.