Exploratory Data Mining and Data Cleaning
Exploratory Data Mining and Data Cleaning
A Pipelined Framework for Online Cleaning of Sensor Data Streams
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Hi-index | 0.00 |
In this paper, we present an approach to large-scale data analysis, Divide and Recombine (D&R), and describe a hardware and software implementation that supports this approach. We then illustrate the use of D&R on large-scale power systems sensor data to perform initial exploration, discover multiple data integrity issues, build and validate algorithms to filter bad data, and construct statistical event detection algorithms. This paper also reports on experiences using a non-traditional Hadoop distributed computing setup on top of a HPC computing cluster.