Distributed Garbage Collection Algorithms for Timestamped Data

  • Authors:
  • Umakishore Ramachandran;Kathleen Knobe;Nissim Harel;Hasnain A. Mandviwala

  • Affiliations:
  • IEEE;-;-;-

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is an important class of interactive multimedia applications that deals with stream data from distributed sources. Indexing the data temporally facilitates ordering individual streams as well as correlating items from different streams. The Stampede programming system organizes stream data into channels that are distributed and synchronized data structures that contain timestamped items. A Stampede program is a data flow graph of threads and channels. Stampede semantics for channels allow concurrent access from multiple threads for input and output. While a channel holds timestamped items, the semantics do not place any restriction on either the production or consumption order of these items. Furthermore, timestamps of items in a channel need not be contiguous. These flexibilities are required due to the dynamic and parallel structure of stream-oriented applications targeted by the Stampede system. Under such circumstances, a key issue is the "garbage collection” (GC) of channel items. In this paper, we present and compare three different GC algorithms: 1) REF is a simple algorithm that keeps a reference count on individual items; 2) TGC is a distributed algorithm for computing a global low watermark for timestamp values of interest in the entire application; 3) DGC is another distributed algorithm that uses information about the dependencies between the producers and consumers of data streams to compute a low water mark local to each node of the data flow graph. DGC can simultaneously eliminate garbage from channels and unneeded computations from threads. In tests performed using an interactive application, DGC enjoys nearly 30 percent reduction in the application memory footprint compared to TGC and REF. DGC and REF are also shown to be more scalable compared to TGC.