Real-time change-point detection using sequentially discounting normalized maximum likelihood coding

Authors:
Yasuhiro Urabe;Kenji Yamanishi;Ryota Tomioka;Hiroki Iwai
Affiliations:
Faculty of Medicine, University of Miyazaki and The University of Tokyo, Tokyo, Japan;Faculty of Medicine, University of Miyazaki and The University of Tokyo, Tokyo, Japan;Faculty of Medicine, University of Miyazaki and The University of Tokyo, Tokyo, Japan;Little eArth Corporation Co., Ltd, Tokyo, Japan
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Year:
2011

Citing 8
Cited 0

Event detection from time series data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Activity monitoring: noticing interesting changes in behavior

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A unifying framework for detecting outliers and change points from non-stationary time series data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A Unifying Framework for Detecting Outliers and Change Points from Time Series

IEEE Transactions on Knowledge and Data Engineering
Statistical change detection for multi-dimensional data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Information and Complexity in Statistical Modeling

Information and Complexity in Statistical Modeling
Intelligent file scoring system for malware detection from the gray list

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Model selection by sequentially normalized least squares

Journal of Multivariate Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are concerned with the issue of real-time change-point detection in time series. This technology has recently received vast attentions in the area of data mining since it can be applied to a wide variety of important risk management issues such as the detection of failures of computer devices from computer performance data, the detection of masqueraders/ malicious executables from computer access logs, etc. In this paper we propose a new method of real-time change point detection employing the sequentially discounting normalized maximum likelihood coding (SDNML). Here the SDNML is a method for sequential data compression of a sequence, which we newly develop in this paper. It attains the least code length for the sequence and the effect of past data is gradually discounted as time goes on, hence the data compression can be done adaptively to non-stationary data sources. In our method, the SDNML is used to learn the mechanism of a time series, then a change-point score at each time is measured in terms of the SDNML code-length. We empirically demonstrate the significant superiority of our method over existing methods, such as the predictive-coding method and the hypothesis testingmethod, in terms of detection accuracy and computational efficiency for artificial data sets. We further apply our method into real security issues called malware detection. We empirically demonstrate that our method is able to detect unseen security incidents at significantly early stages.