Visual exploration of frequent patterns in multivariate time series

  • Authors:
  • Ming C. Hao;Manish Marwah;Halldór Janetzko;Umeshwar Dayal;Daniel A. Keim;Debprakash Patnaik;Naren Ramakrishnan;Ratnesh K. Sharma

  • Affiliations:
  • HP Labs, Palo Alto, CA;HP Labs, Palo Alto, CA;University of Konstanz, Konstanz, Germany;HP Labs, Palo Alto, CA;University of Konstanz, Konstanz, Germany;Virginia Tech.;Virginia Tech.;NEC Laboratories America, Inc.

  • Venue:
  • Information Visualization - Special issue on Visualization and Data Analysis 2011
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The detection of frequently occurring patterns, also called motifs, in data streams has been recognized as an important task. To find these motifs, we use an advanced event encoding and pattern discovery algorithm. As a large time series can contain hundreds of motifs, there is a need to support interactive analysis and exploration. In addition, for certain applications, such as data center resource management, service managers want to be able to predict the next day's power consumption from the previous months' data. For this purpose, we introduce four novel visual analytics methods: {i} motif layout - using colored rectangles for visualizing the occurrences and hierarchical relationships of motifs; {ii} motif distortion - enlarging or shrinking motifs for visualizing them more clearly; {iii} motif merging - combining a number of identical adjacent motif instances to simplify the display; and {iv} pattern preserving prediction - using a pattern-preserving smoothing and prediction algorithm to provide a reliable prediction for seasonal data. We have applied these methods to three real-world datasets: data center chilling utilization, oil well production, and system resource utilization. The results enable service managers to interactively examine motifs and gain new insights into the recurring patterns to analyze system operations. Using the above methods, we have also predicted both power consumption and server utilization in data centers with an accuracy of 70-80%.