A hierarchical clustering algorithm for categorical sequence data

Authors:
Seung-Joon Oh;Jae-Yearn Kim
Affiliations:
Department of Industrial Engineering, Hanyang University, 17 Haengdang-Dong, Seongdong-Ku, Seoul 133-791, Republic of Korea;Department of Industrial Engineering, Hanyang University, 17 Haengdang-Dong, Seongdong-Ku, Seoul 133-791, Republic of Korea
Venue:
Information Processing Letters
Year:
2004

Citing 2
Cited 5

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Clustering Web Sessions by Sequence Alignment

DEXA '02 Proceedings of the 13th International Workshop on Database and Expert Systems Applications

A clustering algorithm based on maximal θ-distant subtrees

Pattern Recognition
An efficient hierarchical clustering model for grouping web transactions

International Journal of Business Intelligence and Data Mining
SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index

Pattern Recognition
USABILICS: avaliação remota de usabilidade e métricas baseadas na análise de tarefas

Proceedings of the 10th Brazilian Symposium on on Human Factors in Computing Systems and the 5th Latin American Conference on Human-Computer Interaction
An Efficient Approach for Incremental Association Rule Mining through Histogram Matching Technique

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.89

Visualization

Abstract

Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. In this paper, we study how to duster these sequence datasets. We propose a new similarity measure to compute the similarity between two sequences and develop a hierarchical clustering algorithm. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional clustering algorithms.