Separating the wheat from the chaff: practical anomaly detection schemes in ecological applications of distributed sensor networks

  • Authors:
  • Luís M. A. Bettencourt;Aric A. Hagberg;Levi B. Larkey

  • Affiliations:
  • Mathematical Modeling and Analysis, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM;Mathematical Modeling and Analysis, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM;Modeling, Algorithms, and Informatics, Computer and Computational Sciences Division, Los Alamos National Laboratory, Los Alamos, NM

  • Venue:
  • DCOSS'07 Proceedings of the 3rd IEEE international conference on Distributed computing in sensor systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop a practical, distributed algorithm to detect events, identify measurement errors, and infer missing readings in ecological applications of wireless sensor networks. To address issues of non-stationarity in environmental data streams, each sensor-processor learns statistical distributions of differences between its readings and those of its neighbors, as well as between its current and previous measurements. Scalar physical quantities such as air temperature, soil moisture, and light flux naturally display a large degree of spatiotemporal coherence, which gives a spectrum of fluctuations between adjacent or consecutive measurements with small variances. This feature permits stable estimation over a small state space. The resulting probability distributions of differences, estimated online in real time, are then used in statistical significance tests to identify rare events. Utilizing the spatio-temporal distributed nature of the measurements across the network, these events are classified as single mode failures - usually corresponding to measurement errors at a single sensor - or common mode events. The event structure also allows the network to automatically attribute potential measurement errors to specific sensors and to correct them in real time via a combination of current measurements at neighboring nodes and the statistics of differences between them. Compared to methods that use Bayesian classification of raw data streams at each sensor, this algorithm is more storage-efficient, learns faster, and is more robust in the face of non-stationary phenomena. Field results from a wireless sensor network (Sensor Web) deployed at Sevilleta National Wildlife Refuge are presented.