Searching time series with Hadoop in an electric power company

  • Authors:
  • Alice Berard;Georges Hebrail

  • Affiliations:
  • TELECOM PARISTECH, Paris, France;ELECTRICITE DE FRANCE, Clamart, France

  • Venue:
  • Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we investigate the possibilities offered by the Hadoop eco-system for searching time series in an electric power company (Top-K or range-queries based on a similarity measure). There has been much work done on speeding up the search of time series in a large dataset, mainly by designing efficient indexing techniques preceded by reduction techniques. In this paper, we do not follow these approaches but focus on using the brutal force of distributed computations in the Hadoop environment. We propose an implementation of time series search functions in Hadoop and describe experiments on a large database of electric power consumption curves (35M customers observed during 1 month at a 30' sampling rate). We also show that this architecture supports easily the computation of several distances for the same query with a small response time overhead: this is very useful in practice when the end-user does not know very well which distance to use.