Extending the Edit Distance Using Frequencies of Common Characters

Authors:
Muhammad Marwan Muhammad Fuad;Pierre-François Marteau
Affiliations:
VALORIA, Université de Bretagne Sud, Vannes, France 56017;VALORIA, Université de Bretagne Sud, Vannes, France 56017
Venue:
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Year:
2008

Citing 9
Cited 2

Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The String-to-String Correction Problem

Journal of the ACM (JACM)
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
A symbolic representation of time series, with implications for streaming algorithms

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Exact indexing of dynamic time warping

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Time-series data mining

ACM Computing Surveys (CSUR)
ABC-SG: a new artificial bee colony algorithm-based distance of sequential data using sigma grams

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity search of time series has attracted many researchers recently. In this scope, reducing the dimensionality of data is required to scale up the similarity search. Symbolic representation is a promising technique of dimensionality reduction, since it allows researchers to benefit from the richness of algorithms used for textual databases. To improve the effectiveness of similarity search we propose in this paper an extension to the edit distance that we call the extended edit distance. This new distance is applied to symbolic sequential data objects, and we test it on time series data bases in classification task experiments. We also prove that our distance is a metric.