Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Similarity search in metric databases through hashing
MULTIMEDIA '01 Proceedings of the 2001 ACM workshops on Multimedia: multimedia information retrieval
ACM Computing Surveys (CSUR)
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
D-Index: Distance Searching Index for Metric Data Sets
Multimedia Tools and Applications
ACM Transactions on Database Systems (TODS)
Solving similarity joins and range queries in metric spaces with the list of twin clusters
Journal of Discrete Algorithms
Online discovery and maintenance of time series motifs
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient exact edit similarity query processing with the asymmetric signature scheme
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Distributed and scalable similarity searching in metric spaces
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Exploiting database similarity joins for metric spaces
Proceedings of the VLDB Endowment
Asymmetric signature schemes for efficient exact edit similarity query processing
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
Similarity join in distance spaces constrained by the metric postulates is the necessary complement of more famous similarity range and the nearest neighbors search primitives. However, the quadratic computational complexity of similarity joins prevents from applications on large data collections. We first study the underlying principles of such joins and suggest three categories of implementation strategies based on filtering, partitioning, or similarity range searching. Then we study an application of the D-index to implement the most promising alternative of range searching. Though also this approach is not able to eliminate the intrinsic quadratic complexity of similarity joins, significant performance improvements are confirmed by experiments.