Data allocation in distributed database systems
ACM Transactions on Database Systems (TODS)
An adaptive data replication algorithm
ACM Transactions on Database Systems (TODS)
Partial replica selection based on relevance for information retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Analysis of a local search heuristic for facility location problems
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Partial collection replication versus caching for information retrieval systems
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Approximation algorithms for data placement in arbitrary networks
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Web caching and replication
Replication Techniques in Distributed Systems
Replication Techniques in Distributed Systems
File and Object Replication in Data Grids
Cluster Computing
Replication strategies in unstructured peer-to-peer networks
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Identifying Dynamic Replication Strategies for a High-Performance Data Grid
GRID '01 Proceedings of the Second International Workshop on Grid Computing
Simulation of Dynamic Data Replication Strategies in Data Grids
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Dynamic XML documents with distribution and replication
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A Dynamic Object Replication and Migration Protocol for an Internet Hosting Service
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
An Overview of Data Replication on the Internet
ISPAN '02 Proceedings of the 2002 International Symposium on Parallel Architectures, Algorithms and Networks
Document replication and distribution in extensible geographically distributed web servers
Journal of Parallel and Distributed Computing - Scalable web services and architecture
Replication algorithms for the World-Wide Web
Journal of Systems Architecture: the EUROMICRO Journal
Replication Methods for Load Balancing on Distributed Storages in P2P Networks
SAINT '05 Proceedings of the The 2005 Symposium on Applications and the Internet
Ganymed: scalable replication for transactional web applications
Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
ACM Computing Surveys (CSUR)
Approximate Algorithms for Document Placement in Distributed Web Servers
IEEE Transactions on Parallel and Distributed Systems
Content and service replication strategies in multi-hop wireless mesh networks
MSWiM '05 Proceedings of the 8th ACM international symposium on Modeling, analysis and simulation of wireless and mobile systems
Information Processing and Management: an International Journal
Job scheduling and data replication on data grids
Future Generation Computer Systems
Information Processing and Management: an International Journal
Pruning policies for two-tiered inverted index with correctness guarantee
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Comparison and analysis of ten static heuristics-based Internet data replication techniques
Journal of Parallel and Distributed Computing
Near-optimal dynamic replication in unstructured peer-to-peer networks
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the feasibility of geographically distributed web crawling
Proceedings of the 3rd international conference on Scalable information systems
Quantifying performance and quality gains in distributed web search engines
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
On the feasibility of multi-site web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
Dynamic replication algorithms for the multi-tier Data Grid
Future Generation Computer Systems - Special issue: Parallel computing technologies
The impact of data replication on job scheduling performance in the Data Grid
Future Generation Computer Systems
Early exit optimizations for additive machine learned ranking systems
Proceedings of the third ACM international conference on Web search and data mining
Performance comparison of clustered and replicated information retrieval systems
ECIR'07 Proceedings of the 29th European conference on IR research
A refreshing perspective of search engine caching
Proceedings of the 19th international conference on World wide web
Query forwarding in geographically distributed search engines
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Document assignment in multi-site search engines
Proceedings of the fourth ACM international conference on Web search and data mining
Assigning documents to master sites in distributed search
Proceedings of the 20th ACM international conference on Information and knowledge management
Terrier information retrieval platform
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Object replication strategies in content distribution networks
Computer Communications
Topology-informed Internet replica placement
Computer Communications
Design and evaluation of data allocation algorithms for distributed multimedia database systems
IEEE Journal on Selected Areas in Communications
Constrained mirror placement on the Internet
IEEE Journal on Selected Areas in Communications
Improving the efficiency of multi-site web search engines
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine.