Statistical analysis with missing data
Statistical analysis with missing data
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms
International Journal of Man-Machine Studies - Special issue: symbolic problem solving in noisy and novel task environments
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
Multidimensional binary search trees used for associative searching
Communications of the ACM
Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement
Information Retrieval
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Estimation from lossy sensor data: jump linear modeling and Kalman filtering
Proceedings of the 3rd international symposium on Information processing in sensor networks
An Evaluation of k-Nearest Neighbour Imputation Using Likert Data
METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
Tributaries and deltas: efficient and robust aggregation in sensor network streams
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Bayesian networks for imputation in classification problems
Journal of Intelligent Information Systems
Reliable data transport and congestion control in wireless sensor networks
International Journal of Sensor Networks
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Missing data is common in Wireless Sensor Networks (WSNs), especially with multi-hop communications. There are many reasons for this phenomenon, such as unstable wireless communications, synchronization issues, and unreliable sensors. Unfortunately, missing data creates a number of problems for WSNs. First, since most sensor nodes in the network are battery-powered, it is too expensive to have the nodes re-transmit missing data across the network. Data re-transmission may also cause time delays when detecting abnormal changes in an environment. Furthermore, localized reasoning techniques on sensor nodes (such as machine learning algorithms to classify states of the environment) are generally not robust enough to handle missing data. Since sensor data collected by a WSN is generally correlated in time and space, we illustrate how replacing missing sensor values with spatially and temporally correlated sensor values can significantly improve the network's performance. However, our studies show that it is important to determine which nodes are spatially and temporally correlated with each other. Simple techniques based on Euclidean distance are not sufficient for complex environmental deployments. Thus, we have developed a novel Nearest Neighbor (NN) imputation method that estimates missing data in WSNs by learning spatial and temporal correlations between sensor nodes. To improve the search time, we utilize a kd-tree data structure, which is a non-parametric, data-driven binary search tree. Instead of using traditional mean and variance of each dimension for kd-tree construction, and Euclidean distance for kd-tree search, we use weighted variances and weighted Euclidean distances based on measured percentages of missing data. We have evaluated this approach through experiments on sensor data from a volcano dataset collected by a network of Crossbow motes, as well as experiments using sensor data from a highway traffic monitoring application. Our experimental results show that our proposed K-NN imputation method has a competitive accuracy with state-of-the-art Expectation-Maximization (EM) techniques, while using much simpler computational techniques, thus making it suitable for use in resource-constrained WSNs.