Finding top-k similar pairs of objects annotated with terms from an ontology

  • Authors:
  • Arnab Bhattacharya;Abhishek Bhowmick;Ambuj K. Singh

  • Affiliations:
  • Computer Science and Engineering, Indian Institute of Technology, Kanpur, India;Computer Science and Engineering, Indian Institute of Technology, Kanpur, India;Computer Science, University of California, Santa Barbara

  • Venue:
  • SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the growing focus on semantic searches, an increasing number of standardized ontologies are being designed to describe data. We investigate the querying of objects described by a tree-structured ontology. Specifically, we consider the case of finding the top-kbest pairs of objects that have been annotated with terms from such an ontology when the object descriptions are available only at runtime. We consider three distance measures. The first one defines the object distance as the minimum pairwise distance between the sets of terms describing them and the second one defines the distance as the average pairwise term distance. The third and most useful distance measure--earth mover's distance-- finds the best way of matching the terms and computes the distance corresponding to this best matching. We develop lower bounds that can be aggregated progressively and utilize them to speed up the search for top-kobject pairs when the earth mover's distance is used. For the minimum pairwise distance, we devise an algorithm that runs inO(D + Tklogk) time, whereDis the total information size andTis the number of terms in the ontology. We also develop a best-first search strategy for the average pairwise distance that utilizes lower bounds generated in an ordered manner. Experiments on real and synthetic datasets demonstrate the practicality and scalability of our algorithms.