Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
SQLCM: A Continuous Monitoring Framework for Relational Database Engines
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Multi-dimensional regression analysis of time-series data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Hi-index | 0.00 |
Problem determination in a database management system can be a difficult task given the complexity of the system and the large amount of data that must be collected and analyzed. Monitoring the system for this data incurs overhead and has a detrimental effect on application performance. As an alternative to the standard practice of storing the performance data and performing offline analysis, we examine an approach where monitoring data is produced as a continuous data stream and data stream mining techniques are applied. We implement this approach as a prototype system called Tempo on IBM DB2®. Tempo implements Top-K analysis, which is a common task performed by database administrators for problem determination. Top-K analysis typically identifies the set of most frequently occurring events, or the highest consumers of system resources. Our experimental evaluation indicates that Tempo is time and space efficient, incurs low overhead, and produces accurate results.