Efficiency-quality tradeoffs for vector score aggregation

Authors:
Pavan Kumar C. Singitham;Mahathi S. Mahabhashyam;Prabhakar Raghavan
Affiliations:
Stanford University, Stanford;Stanford University, Stanford;Verity Inc., Sunnyvale
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 23
Cited 6

Algorithms in combinatorial geometry

Algorithms in combinatorial geometry
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Cluster characterization in information retrieval

SAC '93 Proceedings of the 1993 ACM/SIGAPP symposium on Applied computing: states of the art and practice
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Using linear algebra for intelligent information retrieval

SIAM Review
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Querying multimedia data from multiple repositories by content: the Garlic project

Proceedings of the third IFIP WG2.6 working conference on Visual database systems 3 (VDB-3)
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
A formula for incorporating weights into scoring rules

Theoretical Computer Science - Special issue on the 6th International Conference on Database Theory—ICDT '97
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Managing Gigabytes: Compressing and Indexing Documents and Images

Managing Gigabytes: Compressing and Indexing Documents and Images
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Query strategies for priced information

Journal of Computer and System Sciences - Special issue on STOC 2000
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Efficient similarity search and classification via rank aggregation

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient information gathering on the Internet

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

On the feasibility of low-rank approximation for personalized PageRank

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Variable latent semantic indexing

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
To randomize or not to randomize: space optimal summaries for hyperlink analysis

Proceedings of the 15th international conference on World Wide Web
Improvements in Recall and Precision in Wolters Kluwer Spain Legal Search Engine

Computable Models of the Law
Dynamic user-defined similarity searching in semi-structured text retrieval

Proceedings of the 3rd international conference on Scalable information systems
Permutation indexing: fast approximate retrieval from large corpora

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding the l nearest neighbors to a query in a vector space is an important primitive in text and image retrieval. Here we study an extension of this problem with applications to XML and image retrieval: we have multiple vector spaces, and the query places a weight on each space. Match scores from the spaces are weighted by these weights to determine the overall match between each record and the query; this is a case of score aggregation. We study approximation algorithms that use a small fraction of the computation of exhaustive search through all records, while returning nearly the best matches. We focus on the tradeoff between the computation and the quality of the results. We develop two approaches to retrieval from such multiple vector spaces. The first is inspired by resource allocation. The second, inspired by computational geometry, combines the multiple vector spaces together with all possible query weights into a single larger space. While mathematically elegant, this abstraction is intractable for implementation. We therefore devise an approximation of this combined space. Experiments show that all our approaches (to varying extents) enable retrieval quality comparable to exhaustive search, while avoiding its heavy computational cost.