An access cost-aware approach for object retrieval over multiple sources

Authors:
Benjamin Arai;Gautam Das;Dimitrios Gunopulos;Vagelis Hristidis;Nick Koudas
Affiliations:
University of California, Riverside;University of Texas, Arlington;University of Athens, Greece;Florida International University;University of Toronto
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 24
Cited 0

The effectiveness of GIOSS for the text database discovery problem

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The TSIMMIS Approach to Mediation: Data Models and Languages

Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Experiences with selecting search engines using metasearch

ACM Transactions on Information Systems (TOIS)
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems (TOIS)
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Extracting guarantees from chaos

Communications of the ACM
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Distributed top-k monitoring

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
OceanStore: An Extremely Wide-Area Storage System

OceanStore: An Extremely Wide-Area Storage System
The concept of relevance in IR

Journal of the American Society for Information Science and Technology
A Probabilistic Approach to Metasearching with Adaptive Probing

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Challenges in selecting paths for navigational queries: trade-off of benefit of path versus cost of plan

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
BioFast: challenges in exploring linked life sciences sources

ACM SIGMOD Record
Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Awarded Best Student Paper! - Pond: The OceanStore Prototype

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
A random walk approach to sampling hidden databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Best-Effort Top-k Query Processing Under Budgetary Constraints

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Leveraging COUNT Information in Sampling Hidden Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Source and object selection and retrieval from large multi-source data sets are fundamental operations in many applications. In this paper, we initiate research on efficient source (e.g., database) and object selection algorithms on large multi-source data sets. Specifically, in order to acquire a specified number of satisfying objects with minimum cost over multiple databases, the query engine needs to determine the access overhead for individual data sources, the overhead of retrieving objects from each source, and possibly other statistics such as estimating the frequency of finding a satisfying object in order to determine how many objects to retrieve from each data source. We adopt a probabilistic approach to source selection utilizing a cost structure and a dynamic programming model for computing the optimal number of objects to retrieve from each data source. Such a structure can be a valuable asset where there is a monetary or time related cost associated with accessing large distributed databases. We present a thorough experimental evaluation to validate our techniques using real-world data sets.