Mining Local Correlation Patterns in Sets of Sequences

Authors:
Antti Ukkonen
Affiliations:
Helsinki University of Technology & HIIT,
Venue:
DS '09 Proceedings of the 12th International Conference on Discovery Science
Year:
2009

Citing 9
Cited 0

A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Finding surprising patterns in a time series database in linear time and space

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Asynchronous Periodic Patterns in Time Series Data

IEEE Transactions on Knowledge and Data Engineering
Online Algorithms for Mining Semi-structured Data Stream

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Partial Periodic Patterns in Time Series Database

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Linear work suffix array construction

Journal of the ACM (JACM)
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Characterization of EEG-A comparative study

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a set of (possibly infinite) sequences, we consider the problem of detecting events where a subset of the sequences is correlated for a short period. In other words, we want to find cases where a number of the sequences output exactly the same substring at the same time. Such substrings, together with the sequences in which they are contained, form a local correlation pattern . In practice we only want to find patterns that are longer than *** and appear in at least *** sequences. Our main contribution is an algorithm for mining such patterns in an online case, where the sequences are read in parallel one symbol at a time (no random access) and the patterns must be reported as soon as they occur. We conduct experiments on both artificial and real data. The results show that the proposed algorithm scales well as the number of sequences increases. We also conduct a case study using a public EEG dataset. We show that the local correlation patterns capture essential features that can be used to automatically distinguish subjects diagnosed with a genetic predisposition to alcoholism from a control group.