Approximation Algorithms for Data Placement Problems

  • Authors:
  • Ivan Baev;Rajmohan Rajaraman;Chaitanya Swamy

  • Affiliations:
  • ivan.baev@hp.com;rraj@ccs.neu.edu;cswamy@math.uwaterloo.ca

  • Venue:
  • SIAM Journal on Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop approximation algorithms for the problem of placing replicated data in arbitrary networks, where the nodes may both issue requests for data objects and have capacity for storing data objects so as to minimize the average data-access cost. We introduce the data placement problem to model this problem. We have a set of caches $\mathcal{F}$, a set of clients $\mathcal{D}$, and a set of data objects $\mathcal{O}$. Each cache $i$ can store at most $u_i$ data objects. Each client $j\in\mathcal{D}$ has demand $d_j$ for a specific data object $o(j)\in\mathcal{O}$ and has to be assigned to a cache that stores that object. Storing an object $o$ in cache $i$ incurs a storage cost of $f_i^o$, and assigning client $j$ to cache $i$ incurs an access cost of $d_jc_{ij}$. The goal is to find a placement of the data objects to caches respecting the capacity constraints, and an assignment of clients to caches so as to minimize the total storage and client access costs. We present a 10-approximation algorithm for this problem. Our algorithm is based on rounding an optimal solution to a natural linear-programming relaxation of the problem. One of the main technical challenges encountered during rounding is to preserve the cache capacities while incurring only a constant-factor increase in the solution cost. We also introduce the connected data placement problem to capture settings where write-requests are also issued for data objects, so that one requires a mechanism to maintain consistency of data. We model this by requiring that all caches containing a given object be connected by a Steiner tree to a root for that object, which issues a multicast message upon a write to (any copy of) that object. The total cost now includes the cost of these Steiner trees. We devise a 14-approximation algorithm for this problem. We show that our algorithms can be adapted to handle two variants of the problem: (a) a $k$-median variant, where there is a specified bound on the number of caches that may contain a given object, and (b) a generalization where objects have lengths and the total length of the objects stored in any cache must not exceed its capacity.