Partial Symbol Ordering Distance

Authors:
Javier Herranz;Jordi Nin
Affiliations:
Dept. Matemàtica Aplicada IV, Universitat Politècnica de Catalunya, Barcelona, (Spain) 08034;LAAS, Laboratoire d'Analyse et d'Architecture des Systèmes, CNRS, Centre National de la Recherche Scientifique, Toulouse, (France) 31077
Venue:
MDAI '09 Proceedings of the 6th International Conference on Modeling Decisions for Artificial Intelligence
Year:
2009

Citing 7
Cited 0

Data clustering: a review

ACM Computing Surveys (CSUR)
A technique for computer detection and correction of spelling errors

Communications of the ACM
Searching in metric spaces

ACM Computing Surveys (CSUR)
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
A Similarity Measure for Sequences of Categorical Data Based on the Ordering of Common Elements

MDAI '08 Sabadell Proceedings of the 5th International Conference on Modeling Decisions for Artificial Intelligence
Sequence Data Mining

Sequence Data Mining
S2MP: similarity measure for sequential patterns

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays sequences of symbols are becoming more important, as they are the standard format for representing information in a large variety of domains such as ontologies, sequential patterns or non numerical attributes in databases. Therefore, the development of new distances for this kind of data is a crucial need. Recently, many similarity functions have been proposed for managing sequences of symbols; however, such functions do not always hold the triangular inequality. This property is a mandatory requirement in many data mining algorithms like clustering or k-nearest neighbors algorithms, where the presence of a metric space is a must. In this paper, we propose a new distance for sequences of (non-repeated) symbols based on the partial distances between the positions of the common symbols. We prove that this Partial Symbol Ordering distance satisfies the triangular inequality property, and we finally describe a set of experiments supporting that the new distance outperforms the Edit distance in those scenarios where sequence similarity is related to the positions occupied by the symbols.