A survey of information retrieval and filtering methods
A survey of information retrieval and filtering methods
Distance-based indexing for high-dimensional metric spaces
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
High-dimensional index structures database support for next decade's applications (tutorial)
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
Multidimensional binary search trees used for associative searching
Communications of the ACM
Clustering to minimize the sum of cluster diameters
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
ACM Computing Surveys (CSUR)
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Generalized Search Trees for Database Systems
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
An Assessment of a Metric Space Database Index to Support Sequence Homology
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
Primal-Dual Approximation Algorithms for Metric Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
A metric model of amino acid substitution
Bioinformatics
MoBIoS: a metric-space DBMS to support biological discovery
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Dimension reduction for distance-based indexing
Proceedings of the Third International Conference on SImilarity Search and APplications
Pivot selection: Dimension reduction for distance-based indexing
Journal of Discrete Algorithms
Hi-index | 0.00 |
Similarity search leveraging distance-based index structures is increasingly being used for both multimedia and biological database applications. We consider distance-based indexing for three important biological data types, protein k-mers with the metric PAM model, DNA k-mers with Hamming distance and peptide fragmentation spectra with a pseudo-metric derived from cosine distance. To date, the primary driver of this research has been multimedia applications, where similarity functions are often Euclidean norms on high dimensional feature vectors. We develop results showing that the character of these biological workloads is different from multimedia workloads. In particular, they are not intrinsically very high dimensional, and deserving different optimization heuristics. Based on MVP-trees, we develop a pivot selection heuristic seeking centers and show it outperforms the most widely used corner seeking heuristic. Similarly, we develop a data partitioning approach sensitive to the actual data distribution in lieu of median splits.