Real-time change-point detection using sequentially discounting normalized maximum likelihood coding

  • Authors:
  • Yasuhiro Urabe;Kenji Yamanishi;Ryota Tomioka;Hiroki Iwai

  • Affiliations:
  • Faculty of Medicine, University of Miyazaki and The University of Tokyo, Tokyo, Japan;Faculty of Medicine, University of Miyazaki and The University of Tokyo, Tokyo, Japan;Faculty of Medicine, University of Miyazaki and The University of Tokyo, Tokyo, Japan;Little eArth Corporation Co., Ltd, Tokyo, Japan

  • Venue:
  • PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We are concerned with the issue of real-time change-point detection in time series. This technology has recently received vast attentions in the area of data mining since it can be applied to a wide variety of important risk management issues such as the detection of failures of computer devices from computer performance data, the detection of masqueraders/ malicious executables from computer access logs, etc. In this paper we propose a new method of real-time change point detection employing the sequentially discounting normalized maximum likelihood coding (SDNML). Here the SDNML is a method for sequential data compression of a sequence, which we newly develop in this paper. It attains the least code length for the sequence and the effect of past data is gradually discounted as time goes on, hence the data compression can be done adaptively to non-stationary data sources. In our method, the SDNML is used to learn the mechanism of a time series, then a change-point score at each time is measured in terms of the SDNML code-length. We empirically demonstrate the significant superiority of our method over existing methods, such as the predictive-coding method and the hypothesis testingmethod, in terms of detection accuracy and computational efficiency for artificial data sets. We further apply our method into real security issues called malware detection. We empirically demonstrate that our method is able to detect unseen security incidents at significantly early stages.