Effective data distribution and reallocation strategies for fast query response in distributed query-intensive data environments

  • Authors:
  • Tengjiao Wang;Bishan Yang;Jun Gao;Dongqing Yang

  • Affiliations:
  • Key Laboratory of High Confidence Software Technologies, Peking University, Ministry of Education, China and School of Electronics Engineering and Computer Science, Peking University, Beijing, Chi ...;Key Laboratory of High Confidence Software Technologies, Peking University, Ministry of Education, China and School of Electronics Engineering and Computer Science, Peking University, Beijing, Chi ...;Key Laboratory of High Confidence Software Technologies, Peking University, Ministry of Education, China and School of Electronics Engineering and Computer Science, Peking University, Beijing, Chi ...;Key Laboratory of High Confidence Software Technologies, Peking University, Ministry of Education, China and School of Electronics Engineering and Computer Science, Peking University, Beijing, Chi ...

  • Venue:
  • APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern large distributed applications, such as mobile communications and banking services, require fast responses to enormous and frequent query requests. This kind of application usually employs in a distributed query-intensive data environment, where the system response time significantly depends on ways of data distribution. Motivated by the efficiency need, we develop two novel strategies: a static data distribution strategy DDH and a dynamic data reallocation strategy DRC to speed up the query response time through load balancing. DDH uses a hash-based heuristic technique to distribute data off-line according to the query history. DRC can reallocate data dynamically at runtime to adapt the changing query patterns in the system. To validate the performance of these two strategies, experiments are conducted using a simulation environment and real customer data. Experimental results show that they both offer favorable performance with the increasing query load of the system.