Similarity join in metric spaces

Authors:
Vlastislav Dohnal;Claudio Gennaro;Pasquale Savino;Pavel Zezula
Affiliations:
Masaryk University, Brno, Czech Republic;ISTI-CNR, Pisa, Italy;ISTI-CNR, Pisa, Italy;Masaryk University, Brno, Czech Republic
Venue:
ECIR'03 Proceedings of the 25th European conference on IR research
Year:
2003

Citing 10
Cited 7

Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Indexing large metric spaces for similarity search queries

ACM Transactions on Database Systems (TODS)
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Similarity search in metric databases through hashing

MULTIMEDIA '01 Proceedings of the 2001 ACM workshops on Multimedia: multimedia information retrieval
Searching in metric spaces

ACM Computing Surveys (CSUR)
Approximate XML joins

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Declarative Data Cleaning: Language, Model, and Algorithms

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free

Proceedings of the 27th International Conference on Very Large Data Bases
D-Index: Distance Searching Index for Metric Data Sets

Multimedia Tools and Applications

Metric space similarity joins

ACM Transactions on Database Systems (TODS)
Solving similarity joins and range queries in metric spaces with the list of twin clusters

Journal of Discrete Algorithms
Online discovery and maintenance of time series motifs

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient exact edit similarity query processing with the asymmetric signature scheme

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Distributed and scalable similarity searching in metric spaces

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Exploiting database similarity joins for metric spaces

Proceedings of the VLDB Endowment
Asymmetric signature schemes for efficient exact edit similarity query processing

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity join in distance spaces constrained by the metric postulates is the necessary complement of more famous similarity range and the nearest neighbors search primitives. However, the quadratic computational complexity of similarity joins prevents from applications on large data collections. We first study the underlying principles of such joins and suggest three categories of implementation strategies based on filtering, partitioning, or similarity range searching. Then we study an application of the D-index to implement the most promising alternative of range searching. Though also this approach is not able to eliminate the intrinsic quadratic complexity of similarity joins, significant performance improvements are confirmed by experiments.