Activity monitoring: noticing interesting changes in behavior
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
Flow classification by histograms: or how to go on safari in the internet
Proceedings of the joint international conference on Measurement and modeling of computer systems
Demographic prediction based on user's browsing behavior
Proceedings of the 16th international conference on World Wide Web
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A comparative analysis of web and peer-to-peer traffic
Proceedings of the 17th international conference on World Wide Web
MapReduce Programming Model for .NET-Based Cloud Computing
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Hadoop: The Definitive Guide
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A platform for scalable one-pass analytics using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A distributed look-up architecture for text mining applications using mapreduce
Proceedings of the 20th international symposium on High performance distributed computing
In-situ MapReduce for log processing
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
PrIter: a distributed framework for prioritized iterative computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
iMapReduce: A Distributed Computing Framework for Iterative Computation
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
An Efficient Cross-Match Implementation Based on Directed Join Algorithm in MapReduce
UCC '11 Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud Computing
Temporal Analytics on Big Data for Web Advertising
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Accelerating MapReduce Analytics Using CometCloud
CLOUD '12 Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing
Accelerating Expectation-Maximization Algorithms with Frequent Updates
CLUSTER '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing
A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Efficient analytics on ordered datasets using MapReduce
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Workload Characteristic Oriented Scheduler for MapReduce
ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems
HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers
ICDCS '13 Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems
Hi-index | 0.00 |
One of the most common datasets used by many corporations to gain business intelligence is event log files. Oftentimes, the records in event log files are temporally ordered, and need to be grouped by user ID with the temporal ordering preserved to facilitate mining user behaviors. This kind of analytical workload, here referred to as Relative Order-preserving based Grouping (RE-ORG), is quite common in big data analytics. Using MapReduce/Hadoop for executing RE-ORG tasks on ordered datasets is not efficient due to its internal sort-merge mechanism. In this paper, we propose a distributed framework that adopts an efficient group-order-merge mechanism to provide faster execution of RE-ORG tasks. We demonstrate the advantage of our framework by comparing its performance with Hadoop through extensive experiments on real-world datasets. The evaluation results show that our framework can achieve up to 6.3x speedup over Hadoop in executing RE-ORG tasks.