An efficient similarity measure for clustering of categorical sequences

Authors:
Sang-Kyun Noh;Yong-Min Kim;DongKook Kim;Bong-Nam Noh
Affiliations:
Interdisciplinary Program of Information Security, Chonnam National University, Korea;Dept. of Electronic Commerce, Chonnam National University, Korea;Div. of Electronics Computer Engineering, Chonnam National University, Korea;Div. of Electronics Computer Engineering, Chonnam National University, Korea
Venue:
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Year:
2006

Citing 3
Cited 0

Algorithms and Theory of Computation Handbook

Algorithms and Theory of Computation Handbook
Word reordering and a dynamic programming beam search algorithm for statistical machine translation

Computational Linguistics
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose an efficient similarity measure as pre-processing method for clustering of categorical and sequential attributes. The similarity measure is based on a new dynamic programming algorithm, which computes sequence comparison scoring from the gap penalty matrix. This is presented by normalizing sequence comparison scoring. Self-evaluation of the proposed similarity measure is conducted by experimental results of clustering, which is an unsupervised learning algorithm greatly influenced by similarity measure between clusters. In the experiment, Tcpdump Data from DARPA 1999 Intrusion Detection Evaluation Data Sets are used. These transmission data are composed of sequential packet data in a network. Finally, the results of comparison experiments are discussed.