Designing an inductive data stream management system: the stream mill experience

Authors:
Hetal Thakkar;Barzan Mozafari;Carlo Zaniolo
Affiliations:
University of California at Los Angeles;University of California at Los Angeles;University of California at Los Angeles
Venue:
SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Year:
2008

Citing 27
Cited 1

Sequence query processing

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
A database perspective on knowledge discovery

Communications of the ACM
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimization of sequence queries in database systems

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
MSQL: A Query Language for Database Mining

Data Mining and Knowledge Discovery
Incremental Clustering for Mining in a Data Warehousing Environment

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Tribeca: A Stream Database Manager for Network Traffic Analysis

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A New SQL-like Operator for Mining Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamically maintaining frequent items over a data stream

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A native extension of SQL for mining data streams

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Building data mining solutions with OLE DB for DM and XML for analysis

ACM SIGMOD Record
CanTree: A Tree Structure for Efficient Incremental Mining of Frequent Patterns

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Research issues in data stream association rule mining

ACM SIGMOD Record
Tackling concept drift by temporal inductive transfer

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A data stream language and system designed for power and extensibility

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Resource sharing in continuous sliding-window aggregates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Query languages and data models for database sequences and data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Verifying and Mining Frequent Patterns from Large Windows over Data Streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Integrating pattern mining in relational databases

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Auto-generation of detection rules with tree induction algorithm

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Mining databases and data streams with query languages and rules

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases

Grand challenge: implementation by frequently emitting parallel windows and user-defined aggregate functions

Proceedings of the 7th ACM international conference on Distributed event-based systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been much recent interest in on-line data mining. Existing mining algorithms designed for stored data are either not applicable or not effective on data streams, where real-time response is often needed and data characteristics change frequently. Therefore, researchers have been focusing on designing new and improved algorithms for on-line mining tasks, such as classification, clustering, frequent itemsets mining, pattern matching, etc. Relatively little attention has been paid to designing DSMSs, which facilitate and integrate the task of mining data streams---i.e., stream systems that provide Inductive functionalities analogous to those provided by Weka and MS OLE DB for stored data. In this paper, we propose the notion of an Inductive DSMS---a system that besides providing a rich library of inter-operable functions to support the whole mining process, also supports the essentials of DSMS, including optimization of continuous queries, load shedding, synoptic constructs, and non-stop computing. Ease-of-use and extensibility are additional desiderata for the proposed Inductive DSMS. We first review the many challenges involved in realizing such a system and then present our approach of extending the Stream Mill DSMS toward that goal. Our system features (i) a powerful query language where mining methods are expressed via aggregates for generic streams and arbitrary windows, (ii) a library of fast and light mining algorithms, and (iii) an architecture that makes it easy to customize and extend existing mining methods and introduce new ones.