Mining Local Correlation Patterns in Sets of Sequences

  • Authors:
  • Antti Ukkonen

  • Affiliations:
  • Helsinki University of Technology & HIIT,

  • Venue:
  • DS '09 Proceedings of the 12th International Conference on Discovery Science
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a set of (possibly infinite) sequences, we consider the problem of detecting events where a subset of the sequences is correlated for a short period. In other words, we want to find cases where a number of the sequences output exactly the same substring at the same time. Such substrings, together with the sequences in which they are contained, form a local correlation pattern . In practice we only want to find patterns that are longer than *** and appear in at least *** sequences. Our main contribution is an algorithm for mining such patterns in an online case, where the sequences are read in parallel one symbol at a time (no random access) and the patterns must be reported as soon as they occur. We conduct experiments on both artificial and real data. The results show that the proposed algorithm scales well as the number of sequences increases. We also conduct a case study using a public EEG dataset. We show that the local correlation patterns capture essential features that can be used to automatically distinguish subjects diagnosed with a genetic predisposition to alcoholism from a control group.