Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Partition based spatial-merge join
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Embedding tree metrics into low dimensional Euclidean spaces
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Distance browsing in spatial databases
ACM Transactions on Database Systems (TODS)
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Introduction to Algorithms
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The earth mover's distance as a semantic measure for document similarity
Proceedings of the 14th ACM international conference on Information and knowledge management
Approximation Techniques for Indexing the Earth Mover's Distance in Multimedia Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Measuring semantic similarity between Gene Ontology terms
Data & Knowledge Engineering
Journal of Biomedical Informatics
Corpus-based and knowledge-based measures of text semantic similarity
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Indexing spatially sensitive distance measures using multi-resolution lower bounds
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Hi-index | 0.00 |
With the growing focus on semantic searches, an increasing number of standardized ontologies are being designed to describe data. We investigate the querying of objects described by a tree-structured ontology. Specifically, we consider the case of finding the top-kbest pairs of objects that have been annotated with terms from such an ontology when the object descriptions are available only at runtime. We consider three distance measures. The first one defines the object distance as the minimum pairwise distance between the sets of terms describing them and the second one defines the distance as the average pairwise term distance. The third and most useful distance measure--earth mover's distance-- finds the best way of matching the terms and computes the distance corresponding to this best matching. We develop lower bounds that can be aggregated progressively and utilize them to speed up the search for top-kobject pairs when the earth mover's distance is used. For the minimum pairwise distance, we devise an algorithm that runs inO(D + Tklogk) time, whereDis the total information size andTis the number of terms in the ontology. We also develop a best-first search strategy for the average pairwise distance that utilizes lower bounds generated in an ordered manner. Experiments on real and synthetic datasets demonstrate the practicality and scalability of our algorithms.