Demand-based document dissemination to reduce traffic and balance load in distributed information systems

Authors:
A. Bestavros
Affiliations:
-
Venue:
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
Year:
1995

Citing 0
Cited 25

Using speculation to reduce server load and service time on the WWW

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Maintaining Strong Cache Consistency in the World Wide Web

IEEE Transactions on Computers
Snowball: Scalable Storage on Networks of Workstations with Balanced Load

Distributed and Parallel Databases
Partial replica selection based on relevance for information retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Partial collection replication versus caching for information retrieval systems

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The cache location problem

IEEE/ACM Transactions on Networking (TON)
Migrating Autonomous Objects in a WAN Environment

Journal of Intelligent Information Systems
Summary of WWW characterizations

World Wide Web
Identifying Dynamic Replication Strategies for a High-Performance Data Grid

GRID '01 Proceedings of the Second International Workshop on Grid Computing
Data partitioning and load balancing in parallel disk systems

The VLDB Journal — The International Journal on Very Large Data Bases
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Server Selection Using Dynamic Path Characterization in Wide-Area Networks

INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
Operational requirements for scalable search systems

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Replication algorithms for the World-Wide Web

Journal of Systems Architecture: the EUROMICRO Journal
Subscription-enhanced content delivery

Web content caching and distribution
World-wide web cache consistency

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Web++: a system for fast and reliable web service

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
File grouping for scientific data management: lessons from experimenting with real traces

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Content distribution for publish/subscribe services

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Workload characterization in a high-energy data grid and impact on resource management

Cluster Computing
Pricing strategies for differentiated services content delivery networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
Replicating web contents using a hybrid particle swarm optimization

Information Processing and Management: an International Journal
Caching and Materialization for Web Databases

Foundations and Trends in Databases
ACB-R: an adaptive clustering-based data replication algorithm on a p2p data-store

ASIAN'05 Proceedings of the 10th Asian Computing Science conference on Advances in computer science: data management on the web
A novel cache distribution heuristic algorithm for a mesh of caches and its performance evaluation

Computer Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research on replication techniques to reduce traffic and minimize the latency of information retrieval in a distributed system has concentrated on client-based caching, whereby recently/frequently accessed information is cached at a client (or at a proxy thereof) in anticipation of future accesses. We believe that such myopic solutions-focussing exclusively on a particular client or set of clients-are likely to have a limited impact. Instead, we offer a solution that allows the replication of information to be done on a global supply/demand basis. We propose a hierarchical demand-based replication strategy that optimally disseminates information from its producer to servers that are closer to its consumers. The level of dissemination depends on the relative popularity of documents, and on the expected reduction in traffic that results from such dissemination. We used extensive HTTP logs to validate an analytical model of server popularity and file access profiles. Using that model we show that by disseminating the most popular documents on servers closer to clients, network traffic could be reduced considerably, while servers are load-balanced. We argue that this process could be generalized to provide for an automated server-based information dissemination protocol that will be more effective in reducing both network bandwidth and document retrieval times than client-based caching protocols.