A hierarchical clustering algorithm for categorical sequence data

  • Authors:
  • Seung-Joon Oh;Jae-Yearn Kim

  • Affiliations:
  • Department of Industrial Engineering, Hanyang University, 17 Haengdang-Dong, Seongdong-Ku, Seoul 133-791, Republic of Korea;Department of Industrial Engineering, Hanyang University, 17 Haengdang-Dong, Seongdong-Ku, Seoul 133-791, Republic of Korea

  • Venue:
  • Information Processing Letters
  • Year:
  • 2004

Quantified Score

Hi-index 0.89

Visualization

Abstract

Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. In this paper, we study how to duster these sequence datasets. We propose a new similarity measure to compute the similarity between two sequences and develop a hierarchical clustering algorithm. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional clustering algorithms.