An algorithm for finding nearest neighbours in (approximately) constant average time
Pattern Recognition Letters
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A cost model for similarity queries in metric spaces
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Multidimensional access methods
ACM Computing Surveys (CSUR)
Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Some approaches to best-match file searching
Communications of the ACM
Principles of data mining
ACM Computing Surveys (CSUR)
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees
IEEE Transactions on Knowledge and Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Searching in metric spaces by spatial approximation
The VLDB Journal — The International Journal on Very Large Data Bases
An Effective Clustering Algorithm to Index High Dimensional Metric Spaces
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Metric-Based Shape Retrieval in Large Databases
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
Pivot selection techniques for proximity searching in metric spaces
Pattern Recognition Letters
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
BoostMap: a method for efficient approximate similarity rankings
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Practical construction of k-nearest neighbor graphs in metric spaces
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
On the least cost for proximity searching in metric spaces
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Using the k-nearest neighbor graph for proximity searching in metric spaces
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Bulk construction of dynamic clustered metric trees
Knowledge and Information Systems
Hi-index | 0.00 |
Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. In this article we propose a new dynamic paged and balanced access method for similarity search in metric data sets, named CM-tree (Clustered Metric tree). It fully supports dynamic capabilities of insertions and deletions both of single objects and in bulk. Distinctive from other methods, it is especially designed to achieve a structure of tight and low overlapping clusters via its primary construction algorithms (instead of post-processing), yielding significantly improved performance. Several new methods are introduced to achieve this: a strategy for selecting representative objects of nodes, clustering based node split algorithm and criteria for triggering a node split, and an improved sub-tree pruning method used during search. To facilitate these methods the pairwise distances between the objects of a node are maintained within each node. Results from an extensive experimental study show that the CM-tree outperforms the M-tree and the Slim-tree, improving search performance by up to 312% for I/O costs and 303% for CPU costs.