Bypass Caching: Making Scientific Databases Good Network Citizens

Authors:
Tanu Malik;Randal Burns;Amitabh Chaudhary
Affiliations:
Johns Hopkins University;Johns Hopkins University;University of Notre Dame
Venue:
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Year:
2005

Citing 31
Cited 11

Principles and Techniques in the Design of ADMS±

Computer
Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
On-line caching as cache size varies

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
TCP/IP illustrated (vol. 1): the protocols

TCP/IP illustrated (vol. 1): the protocols
The LRU-K page replacement algorithm for database disk buffering

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Practical predicate placement

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Page replacement with multi-size pages and applications to Web caching

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Proxy caching that estimates page load delays

Selected papers from the sixth international conference on World Wide Web
Online computation and competitive analysis

Online computation and competitive analysis
LP-based analysis of greedy-dual-size

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Distributed processing of very large datasets with DataCutter

Parallel Computing - Clusters and computational grids for scientific computing
DBCache: database caching for web application servers

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Framework for Cache Management for Mobile Databases: Design and Evaluation

Distributed and Parallel Databases
Operating Systems Theory

Operating Systems Theory
A self-managing data cache for edge-of-network web applications

Proceedings of the eleventh international conference on Information and knowledge management
Mariposa: A New Architecture for Distributed Data

Proceedings of the Tenth International Conference on Data Engineering
Role of Aging, Frequency, and Size in Web Cache Replacement Policies

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
A Scalable Algorithm for Answering Queries Using Views

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Performance Study of Query Optimization Algorithms on a Database System Supporting Procedures

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Form-Based Proxy Caching for Database-Backed Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Semantic Data Caching and Replacement

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Average-Case Competitive Analyses for Ski-Rental Problems

ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
WATCHMAN: A Data Warehouse Intelligent Cache Manager

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Optimal implementation of conjunctive queries in relational data bases

STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
Cost-Sensitive Cache Replacement Algorithms

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Popularity-Aware Greedy Dual-Size Web Proxy Caching Algorithms

ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
ARC: A Self-Tuning, Low Overhead Replacement Cache

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Cost-aware WWW proxy caching algorithms

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
GreedyDual* Web caching algorithm: exploiting the two sources of temporal locality in Web request streams

Computer Communications
ICP and the Squid web cache

IEEE Journal on Selected Areas in Communications

Estimating query result sizes for proxy caching in scientific database federations

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Throughput-optimized, global-scale join processing in scientific federations

NETB'07 Proceedings of the 3rd USENIX international workshop on Networking meets databases
Workload-Aware Histograms for Remote Applications

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Object Caching for Queries and Updates

WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
Caching and Materialization for Web Databases

Foundations and Trends in Databases
Admission policies for caches of search engine results

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
A workload-driven unit of cache replacement for mid-tier database caching

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Efficient querying of distributed provenance stores

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Predicting cost amortization for query services

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A dynamic data middleware cache for rapidly-growing scientific repositories

Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific database federations are geographically distributed and network bound. Thus, they could benefit from proxy caching. However, existing caching techniques are not suitable for their workloads, which compare and join large data sets. Existing techniques reduce parallelism by conducting distributed queries in a single cache and lose the data reduction benefits of performing selections at each database. We develop the bypass-yield formulation of caching, which reduces network traffic in wide-area database federations, while preserving parallelism and data reduction. Bypass-yield caching is altruistic; caches minimize the overall network traffic generated by the federation, rather than focusing on local performance. We present an adaptive, workload-driven algorithm for managing a bypass-yield cache. We also develop on-line algorithms that make no assumptions about workload: a k-competitive deterministic algorithm and a randomized algorithm with minimal space complexity. We verify the efficacy of bypass-yield caching by running workload traces collected from the Sloan Digital Sky Survey through a prototype implementation.