Finding simple intensity descriptions from event sequence data

Authors:
Heikki Mannila;Marko Salmenkivi
Affiliations:
Nokia Research Center, FIN-00045 Nokia Group, Finland;University of Helsinki, Finland
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 1
Cited 7

Event detection from time series data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining

Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Bursty and Hierarchical Structure in Streams

Data Mining and Knowledge Discovery
Using Markov chain Monte Carlo and dynamic programming for event sequence data

Knowledge and Information Systems
Constructing comprehensive summaries of large event sequences

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Constructing comprehensive summaries of large event sequences

ACM Transactions on Knowledge Discovery from Data (TKDD)
An algorithmic approach to event summarization

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Natural event summarization

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequences of events are an important type of data arising in various applications, including telecommunications, bio-statistics, web access analysis, etc. A basic approach to modeling such sequences is to find the underlying intensity functions describing the expected number of events per time unit. Typically, the intensity functions are assumed to be piecewise constant. We therefore consider different ways of fitting intensity models to event sequence data. We start by considering a Bayesian approach using Markov chain Monte Carlo (MCMC) methods with varying number of pieces. These methods can be used to produce posterior distributions on the intensity functions and they can also accomodate covariates. The drawback is that they are computationally intensive and thus are not very suitable for data mining applications in which large numbers of intensity functions have to be estimated. We consider dynamic programming approaches to finding the change points in the intensity functions. These methods can find the maximum likelihood intensity function in O(n2k) time for a sequence of n events and k different pieces of intensity. We show that simple heuristics can be used to prune the number of potential change points, yielding speedups of several orders of magnitude. The results of the improved dynamic programming method correspond very closely with the posterior averages produced by the MCMC methods.