Offline and data stream algorithms for efficient computation of synopsis structures

Authors:
Sudipto Guha;Kyuseok Shim
Affiliations:
University of Pennsylvania;Seoul National University
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 0
Cited 1

Efficient trade-off between speed processing and accuracy in summarizing data streams

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Synopsis and small space representations are important data analysis tools and have long been used OLAP/DSS systems, approximate query answering, query optimization and data mining. These techniques represent the input in terms broader characteristics and improve efficiency of various applications, e.g., learning, classification, event detection, among many others. In a recent past, the synopsis techniques have gained more currency due to the emerging areas like data stream management.In this tutorial, we propose to revisit algorithms for Wavelet and Histogram synopsis construction. In the recent years, a significant number of papers have appeared which has advanced the state-of-the-art in synopsis construction considerably. In particular, we have seen the development of a large number of efficient algorithms which are also guaranteed to be near optimal. Furthermore, these synopsis construction problems have found deep roots in theory and database systems, and have influenced a wide range of problems. In a different level, a large number of the synopsis construction algorithms use a similar set of techniques. It is extremely valuable to discuss and analyze these techniques, and we expect broader pictures and paradigms to emerge. This would allow us to develop algorithms for newer problems with greater ease. Understanding these recurrent themes and intuition behind the development of these algorithms is one of the main thrusts of the tutorial.Our goal will be to cover a wide spectrum of these topics and make the researchers in VLDB community aware of the new algorithms, optimum or approximate, offline or streaming. The tutorial will be self contained and develop most of the mathematical and database backgrounds needed.