A cost model for similarity queries in metric spaces
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The String-to-String Correction Problem
Journal of the ACM (JACM)
Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
Some approaches to best-match file searching
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Exploratory image databases: content-based retrieval
Exploratory image databases: content-based retrieval
ACM Computing Surveys (CSUR)
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees
IEEE Transactions on Knowledge and Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Efficient querying on genomic databases by using metric space indexing techniques
DEXA '97 Proceedings of the 8th International Workshop on Database and Expert Systems Applications
Distance Exponent: A New Concept for Selectivity Estimation in Metric Trees
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Retrieval by shape similarity with perceptual distance andeffective indexing
IEEE Transactions on Multimedia
Detecting duplicate objects in XML documents
Proceedings of the 2004 international workshop on Information quality in information systems
Lemmatization of Polish person names
ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Fast business process similarity search with feature-based similarity estimation
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems - Volume Part I
SC-tree: an efficient structure for high-dimensional data indexing
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Fast business process similarity search
Distributed and Parallel Databases
A Comparison of String Similarity Measures for Toponym Matching
Proceedings of The First ACM SIGSPATIAL International Workshop on Computational Models of Place
Hi-index | 0.00 |
Searching in a large data set those strings that are more similar, according to the edit distance, to a given one is a time-consuming process. In this paper we investigate the performance of metric trees, namely the M-tree, when they are extended using a cheap approximate distance function as a filter to quickly discard irrelevant strings. Using the bag distance as an approximation of the edit distance, we show an improvement in performance up to 90% with respect to the basic case. This, along with the fact that our solution is independent on both the distance used in the pre-test and on the underlying metric index, demonstrates that metric indices are a powerful solution, not only for many modern application areas, as multimedia, data mining and pattern recognition, but also for the string matching problem.