Parallel algorithms for hierarchical clustering
Parallel Computing
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Support Vector Machine Construction
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Incremental Clustering for Mining in a Data Warehousing Environment
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The CQL continuous query language: semantic foundations and query execution
The VLDB Journal — The International Journal on Very Large Data Bases
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Identifying suspicious URLs: an application of large-scale online learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
MAD skills: new analysis practices for big data
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
There are only few data mining algorithms that work in a massively parallel and yet online (i.e. incremental) fashion. A combination of both features is essential for mining of large data streams and adds scalability to the concept of Online Aggregation introduced by J. M. Hellerstein et al. in 1997. We show how an online version of the Map-Reduce programming model can be used to implement such algorithms, and propose a solution for the "hardest" class of these algorithms - those requiring multiple Map-Reduce phases. An experimental evaluation confirms that the proposed methods can substantially accelerate interactive analysis of large data sets and facilitate scalable stream mining.