Information theory for data management

  • Authors:
  • Divesh Srivastava;Suresh Venkatasubramanian

  • Affiliations:
  • AT&T Labs--Research;University of Utah

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

We are awash in data. The explosion in computing power and computing infrastructure allows us to generate multitudes of data, in differing formats, at different scales, and in inter-related areas. Data management is fundamentally about the harnessing of this data to extract information, discovering good representations of the information, and analyzing information sources to glean structure. Data management generally presents us with cost-benefit tradeoffs. If we store more information, we get better answers to queries, but we pay the price in terms of increased storage. Conversely, reducing the amount of information we store improves performance at the cost of decreased accuracy for query results. The ability to quantify information gain or loss can only improve our ability to design good representations, storage mechanisms, and analysis tools for data.