A novel optimization approach to efficiently process aggregate similarity queries in metric access methods

  • Authors:
  • Humberto L. Razente;Maria Camila N. Barioni;Agma Juci M. Traina;Christos Faloutsos;Caetano Traina, Jr.

  • Affiliations:
  • University of São Paulo (USP), São Carlos (SP), Brazil;Federal University of ABC (UFABC), Santo Andre (SP), Brazil;University of São Paulo (USP), Sao Carlos (SP), Brazil;Carnegie Mellon University (CMU), Pittsburgh, PA, USA;University of São Paulo (USP), Sao Carlos (SP), Brazil

  • Venue:
  • Proceedings of the 17th ACM conference on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A similarity query considers an element as the query center and searches a dataset to find either the elements far up to a bounding radius or the k nearest ones from the query center. Several algorithms have been developed to efficiently execute similarity queries. However, there are queries that require more than one center, which we call Aggregate Similarity Queries. Such queries appear when the user gives multiple desirable examples, and requests data elements that are similar to all of the examples, as in the case of applying relevance feedback. Here we give the first algorithms that can handle aggregate similarity queries on Metric Access Methods (MAM) such as the M-tree and Slim-tree. Our method, which we call Metric Aggregate Similarity Search (MASS) has the following properties: (a) it requires only the triangle inequality property; (b) it guarantees no false-dismissals, as we prove that it lower-bounds the aggregate distance scores; (c) it can work with any MAM; (d) it can handle any number of query centers, which are either scattered all over the space or concentrated on a restricted region. Experiments on both real and synthetic data show that our method scales on both the number of elements and, if the dataset is in a spatial domain, also on its dimensionality. Moreover, it achieves better results than previous related methods.