A graph-based clustering algorithm in large transaction databases

  • Authors:
  • Ning Chen;An Chen;Longxiang Zhou;Liu Lu

  • Affiliations:
  • Mathematics Institute, Chinese Academy of Sciences, Beijing 100080, P.R. China;Economics and Management School, Beijing University of Aeronautics & Astronautic, Beijing 100083, P.R. China. E-mail: anchen1@yahoo.com;Economics and Management School, Beijing University of Aeronautics & Astronautic, Beijing 100083, P.R. China. E-mail: anchen1@yahoo.com;Economics and Management School, Beijing University of Aeronautics & Astronautic, Beijing 100083, P.R. China. E-mail: anchen1@yahoo.com

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2001

Quantified Score

Hi-index 0.01

Visualization

Abstract

Clustering in transaction databases can find potentially useful patterns to improve the product profit. Unfortunately, most clustering algorithms based on metric distances are not appropriate for transaction data. In this paper, we study the problem of item clustering in large transaction databases. We first present a definition of similarity measure between items based on large itemsets presented in transaction databases, which not only captures the co-occurrence relationship of items but also remains insensitive to noise. We represent the similarity relationship by an undirected graph and transform the clustering problem into discovering connected components of the graph. We also discuss the evaluation of clustering quality and develop an automatic optimizer for the optimum thresholds search, finding the item clustering which optimizes the quality.