Partial Symbol Ordering Distance

  • Authors:
  • Javier Herranz;Jordi Nin

  • Affiliations:
  • Dept. Matemàtica Aplicada IV, Universitat Politècnica de Catalunya, Barcelona, (Spain) 08034;LAAS, Laboratoire d'Analyse et d'Architecture des Systèmes, CNRS, Centre National de la Recherche Scientifique, Toulouse, (France) 31077

  • Venue:
  • MDAI '09 Proceedings of the 6th International Conference on Modeling Decisions for Artificial Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays sequences of symbols are becoming more important, as they are the standard format for representing information in a large variety of domains such as ontologies, sequential patterns or non numerical attributes in databases. Therefore, the development of new distances for this kind of data is a crucial need. Recently, many similarity functions have been proposed for managing sequences of symbols; however, such functions do not always hold the triangular inequality. This property is a mandatory requirement in many data mining algorithms like clustering or k-nearest neighbors algorithms, where the presence of a metric space is a must. In this paper, we propose a new distance for sequences of (non-repeated) symbols based on the partial distances between the positions of the common symbols. We prove that this Partial Symbol Ordering distance satisfies the triangular inequality property, and we finally describe a set of experiments supporting that the new distance outperforms the Edit distance in those scenarios where sequence similarity is related to the positions occupied by the symbols.