MIST: distributed indexing and querying in sensor networks using statistical models

  • Authors:
  • Arnab Bhattacharya;Anand Meka;Ambuj K. Singh

  • Affiliations:
  • University of California, Santa Barbara, CA;University of California, Santa Barbara, CA;University of California, Santa Barbara, CA

  • Venue:
  • VLDB '07 Proceedings of the 33rd international conference on Very large data bases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The modeling of high level semantic events from low level sensor signals is important in order to understand distributed phenomena. For such content-modeling purposes, transformation of numeric data into symbols and the modeling of resulting symbolic sequences can be achieved using statistical models---Markov Chains (MCs) and Hidden Markov Models (HMMs). We consider the problem of distributed indexing and semantic querying over such sensor models. Specifically, we are interested in efficiently answering (i) range queries: return all sensors that have observed an unusual sequence of symbols with a high likelihood, (ii) top-1 queries: return the sensor that has the maximum probability of observing a given sequence, and (iii) 1-NN queries: return the sensor (model) which is most similar to a query model. All the above queries can be answered at the centralized base station, if each sensor transmits its model to the base station. However, this is communication-intensive. We present a much more efficient alternative---a distributed index structure, MIST (Model-based Index STructure), and accompanying algorithms for answering the above queries. MIST aggregates two or more constituent models into a single composite model, and constructs an in-network hierarchy over such composite models. We develop two kinds of composite models: the first kind captures the average behavior of the underlying models and the second kind captures the extreme behaviors of the underlying models. Using the index parameters maintained at the root of a subtree, we bound the probability of observation of a query sequence from a sensor in the subtree. We also bound the distance of a query model to a sensor model using these parameters. Extensive experimental evaluation on both real-world and synthetic data sets show that the MIST schemes scale well in terms of network size and number of model states. We also show its superior performance over the centralized schemes in terms of update, query, and total communication costs.