Patch clustering for massive data sets
Neurocomputing
Efficient Learning from Massive Spatial-Temporal Data through Selective Support Vector Propagation
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Hi-index | 0.00 |
Computing data mining algorithms such as clustering onmassive geospatial data sets is still not feasible nor efficienttoday. In this paper, we introduce a k-means algorithmthat is based on the data stream paradigm. The so-calledpartial/merge k-means algorithm is implemented as a setof data stream operators which are adaptable to availablecomputing resources such as volatile memory and processingpower. The partial data stream operator consumes asmuch data as can be fit into RAM, and performs a weightedk-means on the data subset. Subsequently, the weightedpartial results are merged by a second data stream operator.All operators can be cloned, and parallelized. Inour analytical and experimental performance evaluation,we demonstrate that the partial/merge k-means can outperforma one-step algorithm by a large margin with regardto overall computation time and clustering quality with increasingdata density per grid cell.