The anatomy of a stream processing system

  • Authors:
  • Altaf Gilani;Satyajeet Sonune;Balakumar Kendai;Sharma Chakravarthy

  • Affiliations:
  • Information Technology Laboratory, and Department of Computer Science and Engineering, The University of Texas at Arlington;Information Technology Laboratory, and Department of Computer Science and Engineering, The University of Texas at Arlington;Information Technology Laboratory, and Department of Computer Science and Engineering, The University of Texas at Arlington;Information Technology Laboratory, and Department of Computer Science and Engineering, The University of Texas at Arlington

  • Venue:
  • BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data intensive applications such as network monitoring, financial applications; sensor-based applications etc. need to be supported by general-purpose systems rather than customized implementations. They have a continuous, unpredictable and unbounded flow of data as input, referred as streams. The fact that data comes as a stream with varying input rates (instead of accessing data stored on a disk in a predictable way) and that quality of service (QoS) requirements are stringent for these applications warrants a re-examination of the fundamental architecture of a DBMS. This paper describes the basic processing model and architecture of MavStream – a new Data Stream Management System (DSMS) being developed at UT Arlington. The architecture of MavStream is the primary focus of this paper. The user can give a continuous query from a graphical user interface (GUI), which is instantiated, scheduled, and executed by the MavStream server. We first provide an overview of the basic model and architecture and then describe some of the components of the system. We provide some experimental results to demonstrate the utility of the system and the effect of different scheduling strategies and buffer sizes on the performance and output.