Using information-theoretic principles to discover interesting episodes in a time-ordered input sequence

  • Authors:
  • Edwin O. Heierman, III;Diane J. Cook

  • Affiliations:
  • The University of Texas at Arlington;The University of Texas at Arlington

  • Venue:
  • Using information-theoretic principles to discover interesting episodes in a time-ordered input sequence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Knowledge discovery techniques can be applied to discover interesting patterns of interactions contained in a temporal sequence. Existing approaches use frequency, and sometimes length, as measurements for interestingness. Because these are temporal input sequences, additional characteristics, such as periodicity, may also be interesting. In addition, current techniques do not provide a means of evaluating one collection of interesting patterns versus another. Such a value would be useful to determine if one collection of discovered patterns is more interesting than another, which is the case when a technique can produce more than one set of interesting patterns depending on the algorithm parameters. We propose that information-theoretic principles can be used to evaluate interesting characteristics of time-ordered input sequences. By using such an approach, additional characteristics can be discovered and a measure for the discovered patterns can be provided. In this dissertation, we present a novel data mining technique, called Episode Discovery (ED), based on the Minimum Description Length (MDL) principle. ED discovers patterns with interesting features in a time-ordered sequence by computing a compression ratio for a description of the input sequence based on the discovered patterns. First, we present the MDL foundation of our approach, as well as the details of our algorithm. Multiple capabilities of the algorithm are also demonstrated, such as the use of the evaluation measure for the patterns discovered and using the algorithm in a real-time environment. Finally, we present two case studies where ED was integrated with components from an intelligent environment. The first case study shows that ED can be used to improve the performance of a predictor, while the second case study shows the benefits of integrating the technique with a decision maker.