A cost model for similarity queries in metric spaces
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The choice of reference points in best-match file searching
Communications of the ACM
ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods
Proceedings of the 17th International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Searching in metric spaces by spatial approximation
The VLDB Journal — The International Journal on Very Large Data Bases
Comparison of AESA and LAESA search algorithms using string and tree-edit-distances
Pattern Recognition Letters
Pivot selection techniques for proximity searching in metric spaces
Pattern Recognition Letters
Approximate searches: k-neighbors + precision
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
Speedup Clustering with Hierarchical Ranking
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The VLDB Journal — The International Journal on Very Large Data Bases
Trying to outperform a well-known index with a sequential scan
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Hi-index | 0.00 |
Modern biological applications usually involve the similarity comparison between two objects, which is often computationally very expensive, such as whole genome pairwise alignment and protein 3D structure alignment. Nevertheless, being able to quickly identify the closest neighboring objects from very large databases for a newly obtained sequence or structure can provide timely hints to its functions and more. This paper presents a substantial speedup technique for the well-studied k-nearest neighbor (k-nn) search, based on novel concepts of virtual pivots and partial pivots, such that a significant number of the expensive distance computations can be avoided. The new method is able to dynamically locate virtual pivots, according to the query, with increasing pruning ability. Using the same or less amount of database preprocessing effort, the new method outperformed the second best method by using no more than 40 percent distance computations per query, on a database of 10,000 gene sequences, compared to several best known k-nn search methods including M-Tree, OMNI, SA-Tree, and LAESA. We demonstrated the use of this method on two biological sequence data sets, one of which is for HIV-1 viral strain computational genotyping.