Full Length Article: Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks

Authors:
Yuanyuan Li;Lynne E. Parker
Affiliations:
Biostatistics Branch, National Institute of Environmental Health Sciences, NIH, DHHS, Research Triangle Park, NC 27709, United States;Distributed Intelligence Laboratory, Department of Electrical Engineering and Computer Science, The University of Tennessee, 1520 Middle Drive, Knoxville, TN 37996, United States
Venue:
Information Fusion
Year:
2014

Citing 12
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms

International Journal of Man-Machine Studies - Special issue: symbolic problem solving in noisy and novel task environments
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Multidimensional binary search trees used for associative searching

Communications of the ACM
Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement

Information Retrieval
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Estimation from lossy sensor data: jump linear modeling and Kalman filtering

Proceedings of the 3rd international symposium on Information processing in sensor networks
An Evaluation of k-Nearest Neighbour Imputation Using Likert Data

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
Tributaries and deltas: efficient and robust aggregation in sensor network streams

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Bayesian networks for imputation in classification problems

Journal of Intelligent Information Systems
Reliable data transport and congestion control in wireless sensor networks

International Journal of Sensor Networks
Nearest neighbor pattern classification

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Missing data is common in Wireless Sensor Networks (WSNs), especially with multi-hop communications. There are many reasons for this phenomenon, such as unstable wireless communications, synchronization issues, and unreliable sensors. Unfortunately, missing data creates a number of problems for WSNs. First, since most sensor nodes in the network are battery-powered, it is too expensive to have the nodes re-transmit missing data across the network. Data re-transmission may also cause time delays when detecting abnormal changes in an environment. Furthermore, localized reasoning techniques on sensor nodes (such as machine learning algorithms to classify states of the environment) are generally not robust enough to handle missing data. Since sensor data collected by a WSN is generally correlated in time and space, we illustrate how replacing missing sensor values with spatially and temporally correlated sensor values can significantly improve the network's performance. However, our studies show that it is important to determine which nodes are spatially and temporally correlated with each other. Simple techniques based on Euclidean distance are not sufficient for complex environmental deployments. Thus, we have developed a novel Nearest Neighbor (NN) imputation method that estimates missing data in WSNs by learning spatial and temporal correlations between sensor nodes. To improve the search time, we utilize a kd-tree data structure, which is a non-parametric, data-driven binary search tree. Instead of using traditional mean and variance of each dimension for kd-tree construction, and Euclidean distance for kd-tree search, we use weighted variances and weighted Euclidean distances based on measured percentages of missing data. We have evaluated this approach through experiments on sensor data from a volcano dataset collected by a network of Crossbow motes, as well as experiments using sensor data from a highway traffic monitoring application. Our experimental results show that our proposed K-NN imputation method has a competitive accuracy with state-of-the-art Expectation-Maximization (EM) techniques, while using much simpler computational techniques, thus making it suitable for use in resource-constrained WSNs.