An efficient similarity measure for clustering of categorical sequences

  • Authors:
  • Sang-Kyun Noh;Yong-Min Kim;DongKook Kim;Bong-Nam Noh

  • Affiliations:
  • Interdisciplinary Program of Information Security, Chonnam National University, Korea;Dept. of Electronic Commerce, Chonnam National University, Korea;Div. of Electronics Computer Engineering, Chonnam National University, Korea;Div. of Electronics Computer Engineering, Chonnam National University, Korea

  • Venue:
  • AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose an efficient similarity measure as pre-processing method for clustering of categorical and sequential attributes. The similarity measure is based on a new dynamic programming algorithm, which computes sequence comparison scoring from the gap penalty matrix. This is presented by normalizing sequence comparison scoring. Self-evaluation of the proposed similarity measure is conducted by experimental results of clustering, which is an unsupervised learning algorithm greatly influenced by similarity measure between clusters. In the experiment, Tcpdump Data from DARPA 1999 Intrusion Detection Evaluation Data Sets are used. These transmission data are composed of sequential packet data in a network. Finally, the results of comparison experiments are discussed.