Designing an inductive data stream management system: the stream mill experience

  • Authors:
  • Hetal Thakkar;Barzan Mozafari;Carlo Zaniolo

  • Affiliations:
  • University of California at Los Angeles;University of California at Los Angeles;University of California at Los Angeles

  • Venue:
  • SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

There has been much recent interest in on-line data mining. Existing mining algorithms designed for stored data are either not applicable or not effective on data streams, where real-time response is often needed and data characteristics change frequently. Therefore, researchers have been focusing on designing new and improved algorithms for on-line mining tasks, such as classification, clustering, frequent itemsets mining, pattern matching, etc. Relatively little attention has been paid to designing DSMSs, which facilitate and integrate the task of mining data streams---i.e., stream systems that provide Inductive functionalities analogous to those provided by Weka and MS OLE DB for stored data. In this paper, we propose the notion of an Inductive DSMS---a system that besides providing a rich library of inter-operable functions to support the whole mining process, also supports the essentials of DSMS, including optimization of continuous queries, load shedding, synoptic constructs, and non-stop computing. Ease-of-use and extensibility are additional desiderata for the proposed Inductive DSMS. We first review the many challenges involved in realizing such a system and then present our approach of extending the Stream Mill DSMS toward that goal. Our system features (i) a powerful query language where mining methods are expressed via aggregates for generic streams and arbitrary windows, (ii) a library of fast and light mining algorithms, and (iii) an architecture that makes it easy to customize and extend existing mining methods and introduce new ones.