A Cluster-Based Context-Tree Model for Multivariate Data Streams with Applications to Anomaly Detection

Authors:
Pierre Brice;Wei Jiang;Guohua Wan
Affiliations:
Motorola, Hackensack, New Jersey 07601;Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200052, China;Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200052, China
Venue:
INFORMS Journal on Computing
Year:
2011

Citing 12
Cited 0

COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Compression schemes for differential categorical stream clustering

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Combining proactive and reactive predictions for data streams

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Evolutionary clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Stop Chasing Trends: Discovering High Order Models in Evolving Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A universal prediction lemma and applications to universal data compression and prediction

IEEE Transactions on Information Theory
A universal finite memory source

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many applications, such as telecommunication and commercial video broadcasting systems, computer and networks, and Web mining, require modeling data streams that exhibit context dependency. Context dependency refers to the fact that the statistical distribution of a new sample is heavily conditioned by a set of the most recent samples that precedes it. However, statistical models such as context trees (CTs) that capture context dependency tend to be poorly scalable. This paper proposes a solution to the scalability problem of these models by transforming a data stream into high-level aggregates of clusters instead of modeling the original data stream. Using an information-theoretical approach, we leverage existing clustering techniques for static categorical data sets to capture dynamic data streams based on the CT models. Because the proposed approach can be applied repeatedly on different levels of a clustering hierarchy, it is suitable for predicting trends and detecting anomalies at any aggregate (or detail) level required. Experimental results, including video stream modeling, network intrusion detection, and Monte Carlo simulations, show that the proposed method is efficient in capturing high-level aggregates of large-scale dynamic systems and very effective for trend prediction and anomaly detection.