Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pfp: parallel fp-growth for query recommendation
Proceedings of the 2008 ACM conference on Recommender systems
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Proceedings of the 18th international conference on World wide web
Pairwise document similarity in large collections with MapReduce
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications
AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
PLANET: massively parallel learning of tree ensembles with MapReduce
Proceedings of the VLDB Endowment
An Efficient Hierarchical Clustering Method for Large Datasets with Map-Reduce
PDCAT '09 Proceedings of the 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies
Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce
Proceedings of the 19th international conference on World wide web
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Parallel Simultaneous Co-clustering and Learning with Map-Reduce
GRC '10 Proceedings of the 2010 IEEE International Conference on Granular Computing
Document Similarity Self-Join with MapReduce
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
PEGASUS: mining peta-scale graphs
Knowledge and Information Systems - Special Issue: Best Papers of the Fifth International Conference on Advanced Data Mining and Applications (ADMA 2009)
Processing theta-joins using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Spectral analysis for billion-scale graphs: discoveries and implementation
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
TWITOBI: A Recommendation System for Twitter Using Probabilistic Modeling
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce
ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce
Proceedings of the 21st international conference on World Wide Web
V-SMART-join: a scalable mapreduce framework for all-pair similarity joins of multisets and vectors
Proceedings of the VLDB Endowment
Parallel Top-K Similarity Join Algorithms Using MapReduce
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Hi-index | 0.00 |
There is a growing trend of applications that should handle big data. However, analyzing big data is a very challenging problem today. For such applications, the MapReduce framework has recently attracted a lot of attention. Google's MapReduce or its open-source equivalent Hadoop is a powerful tool for building such applications. In this tutorial, we will introduce the MapReduce framework based on Hadoop, discuss how to design efficient MapReduce algorithms and present the state-of-the-art in MapReduce algorithms for data mining, machine learning and similarity joins. The intended audience of this tutorial is professionals who plan to design and develop MapReduce algorithms and researchers who should be aware of the state-of-the-art in MapReduce algorithms available today for big data analysis.