Data stream mining and resource adaptive computation

  • Authors:
  • Philip S. Yu

  • Affiliations:
  • IBM T.J. Watson Research Center, Hawthorne, NY

  • Venue:
  • DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of data streams has gained importance in recent years because of advances in hardware technology. These advances have made it easy to store and record numerous transactions and activities in everyday life in an automated way. The ubiquitous presence of data streams in a number of practical domains has generated a lot of research in this area. Example applications include trade surveillance for security fraud and money laundering, network monitoring for intrusion detection, bio-surveillance for terrorist attack, and others. Data is viewed as a continuous stream in this kind of applications. Problems such as data mining which have been widely studied for traditional data sets cannot be easily solved for the data stream domain. This is because the large volume of data arriving in a stream renders most algorithms to inefficient as most mining algorithms require multiple scans of data which is unrealistic for stream data. More importantly, the characteristics of the data stream can change over time and the evolving pattern needs to be captured. Furthermore, we need to consider the problem of resource allocation in mining data streams. Due to the large volume and the high speed of streaming data, mining algorithms must cope with the effects of system overload. Thus, how to achieve optimum results under various resource constraints becomes a challenging task. In this talk, I'll provide an overview, discuss the issues and focus on how to mine evolving data streams and perform resource adaptive computation.