Change detection in learning histograms from data streams

Authors:
Raquel Sebastião;João Gama
Affiliations:
LIAAD, INESC Porto L.A., University of Porto, Porto, Portugal;LIAAD, INESC Porto L.A., University of Porto, Porto, Portugal and Faculty of Economics, University of Porto, Porto, Portugal
Venue:
EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Year:
2007

Citing 8
Cited 5

Learning in the presence of concept drift and hidden contexts

Machine Learning
Selecting Examples for Partial Memory Learning

Machine Learning
Intelligent Data Analysis: An Introduction

Intelligent Data Analysis: An Introduction
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Discretization from data streams: applications to histograms and data mining

Proceedings of the 2006 ACM symposium on Applied computing
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Learning from Data Streams: Synopsis and Change Detection

Proceedings of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers' Symposium
Context change detection for resource allocation in service-oriented systems

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
Monitoring incremental histogram distribution for change detection in data streams

Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
A survey on concept drift adaptation

ACM Computing Surveys (CSUR)
Classifying evolving data streams with partially labeled data

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study the problem of constructing histograms from high-speed time-changing data streams. Learning in this context requires the ability to process examples once at the rate they arrive, maintaining a histogram consistent with the most recent data, and forgetting out-date data whenever a change in the distribution is detected. To construct histogram from high-speed data streams we use the two layer structure used in the Partition Incremental Discretization (PiD) algorithm. Our contribution is a new method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference time window, that reflects the distribution observed in the past; and the current time window reflecting the distribution observed in the most recent data. We compare both distributions and signal a change whenever they are greater than a threshold value, using three different methods: the Entropy Absolute Difference, the Kullback-Leibler divergence and the Cosine Distance. The experimental results suggest that Kullback-Leibler divergence exhibit high probability in change detection, faster detection rates, with few false positives alarms.