Pattern selection problems in multivariate time-series using equation discovery

Authors:
Arne Koopman;Arno Knobbe;Marvin Meeng
Affiliations:
LIACS, Universiteit Leiden;LIACS, Universiteit Leiden;LIACS, Universiteit Leiden
Venue:
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Year:
2010

Citing 5
Cited 0

Declarative Bias in Equation Discovery

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Maximally informative k-itemsets and their efficient discovery

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic Bayesian Networks for Real-Time Classification of Seismic Signals

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
The Chosen Few: On Identifying Valuable Patterns

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
InfraWatch: data management of large systems for monitoring infrastructural performance

IDA'10 Proceedings of the 9th international conference on Advances in Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a method for pattern selection in collections of patterns discovered in multivariate time-series. Because our data is continuous in nature, the pattern language we consider is somewhat out of the ordinary, compared to the common discrete patterns considered in the data mining field. An equation discovery system is employed to generate either regular algebraic equations, or more complex differential equations. As the equation discovery system generates a collection of equations per target variable, and we require equations for each variable, we are dealing with an abundance of equations, quite likely with serious levels of redundancy. The method presented here selects a subset of equations by considering to what extent the different variables are covered by the selected equations, while optimising the relevance of variables within the equations. As such, the equation selection method returns a concise set of equations, that captures the dependencies between the different time-series well, while minimizing redundancy. The work in this paper is inspired by the new InfraWatch project, which deals with high-resolution sensor data from a highway bridge. The 145 sensors (sensing structural characteristics such as stretch, vibration and temperature) are distributed fairly densely over the bridge, such that adjacent sensors are likely to show correlated signals. Especially in an exploratory setting, one would be interested in a small collection of prototype sensors with associated equations for how these prototypes are related to other sensors in the vicinity. In the experimental section, we demonstrate how the sensors can be modeled by (differential) equations, and how the equation selection method picks relevant equations that models structural properties of the bridge sensibly.