Compact histograms for hierarchical identifiers

  • Authors:
  • Frederick Reiss;Minos Garofalakis;Joseph M. Hellerstein

  • Affiliations:
  • U.C. Berkeley Department of Electrical Engineering and Computer Science;Intel Research Berkeley;U.C. Berkeley Department of Electrical Engineering and Computer Science

  • Venue:
  • VLDB '06 Proceedings of the 32nd international conference on Very large data bases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Distributed monitoring applications often involve streams of unique identifiers (UIDs) such as IP addresses or RFID tag IDs. An important class of query for such applications involves partitioning the UIDs into groups using a large lookup table; the query then performs aggregation over the groups. We propose using histograms to reduce bandwidth utilization in such settings, using a histogram partitioning function as a compact representation of the lookup table. We investigate methods for constructing histogram partitioning functions for lookup tables over unique identifiers that form a hierarchy of contiguous groups, as is the case with network addresses and several other types of UID. Each bucket in our histograms corresponds to a subtree of the hierarchy. We develop three novel classes of partitioning functions for this domain, which vary in their structure, construction time, and estimation accuracy.Our approach provides several advantages over previous work. We show that optimal instances of our partitioning functions can be constructed efficiently from large lookup tables. The partitioning functions are also compact, with each partition represented by a single identifier. Finally, our algorithms support minimizing any error metric that can be expressed as a distributive aggregate; and they extend naturally to multiple hierarchical dimensions. In experiments on real-world network monitoring data, we show that our histograms provide significantly higher accuracy per bit than existing techniques.