A random method for quantifying changing distributions in data streams

Authors:
Haixun Wang;Jian Pei
Affiliations:
IBM T. J. Watson Research Center;Simon Fraser University, Canada
Venue:
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2005

Citing 9
Cited 4

Elements of information theory

Elements of information theory
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Random Forests

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A framework for diagnosing changes in evolving data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Online novelty detection on temporal sequences

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Effective variation management for pseudo periodical streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Self-tuning query mesh for adaptive multi-route query processing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
PGG: an online pattern based approach for stream variation management

Journal of Computer Science and Technology
Constrained logistic regression for discriminative pattern mining

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In applications such as fraud and intrusion detection, it is of great interest to measure the evolving trends in the data. We consider the problem of quantifying changes between two datasets with class labels. Traditionally, changes are often measured by first estimating the probability distributions of the given data, and then computing the distance, for instance, the K-L divergence, between the estimated distributions. However, this approach is computationally infeasible for large, high dimensional datasets. The problem becomes more challenging in the streaming data environment, as the high speed makes it difficult for the learning process to keep up with the concept drifts in the data. To tackle this problem, we propose a method to quantify concept drifts using a universal model that incurs minimal learning cost. In addition, our model also provides the ability of performing classification.