A Similarity Measure for Sequences of Categorical Data Based on the Ordering of Common Elements

Authors:
Cristina Gómez-Alonso;Aida Valls
Affiliations:
iTAKA Research Group - Intelligent Tech. for Advanced Knowledge Acquisition Department of Computer Science and Mathematics, Universitat Rovira i Virgili, Tarragona, Spain 43007;iTAKA Research Group - Intelligent Tech. for Advanced Knowledge Acquisition Department of Computer Science and Mathematics, Universitat Rovira i Virgili, Tarragona, Spain 43007
Venue:
MDAI '08 Sabadell Proceedings of the 5th International Conference on Modeling Decisions for Artificial Intelligence
Year:
2008

Citing 7
Cited 3

Data clustering: a review

ACM Computing Surveys (CSUR)
Machine Learning for Sequential Data: A Review

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Sequence Data Mining (Advances in Database Systems)

Sequence Data Mining (Advances in Database Systems)
Hiding Sequences

ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Clustering of time series data-a survey

Pattern Recognition
Extending microaggregation procedures for time series protection

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing

Partial Symbol Ordering Distance

MDAI '09 Proceedings of the 6th International Conference on Modeling Decisions for Artificial Intelligence
A qualitative study of similarity measures in event-based data

HI'11 Proceedings of the 2011 international conference on Human interface and the management of information - Volume Part I
Querying event sequences by exact match or similarity search: Design and empirical evaluation

Interacting with Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity measures are usually used to compare items and identify pairs or groups of similar individuals. The similarity measure strongly depends on the type of values to compare. We have faced the problem of considering that the information of the individuals is a sequence of events (i.e. sequences of web pages visited by a certain user or the personal daily schedule). Some measures for numerical sequences exist, but very few methods consider sequences of categorical data. In this paper, we present a new similarity measure for sequences of categorical labels and compare it with the previous approaches.