Google's MapReduce programming model – Revisited
Science of Computer Programming
Top 10 algorithms in data mining
Knowledge and Information Systems
Data mining using high performance data clouds: experimental studies using sector and sphere
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Future Generation Computer Systems
Agent Mining: The Synergy of Agents and Data Mining
IEEE Intelligent Systems
Ubiquitous Intelligence in Agent Mining
Agents and Data Mining Interaction
Domain-Driven Data Mining: Challenges and Prospects
IEEE Transactions on Knowledge and Data Engineering
Multi-agent information retrieval in heterogeneous industrial automation environments
ADMI'10 Proceedings of the 6th international conference on Agents and data mining interaction
A multi-agent data mining system for cartel detection in Brazilian government procurement
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Distributed data mining (DDM) which often utilizes autonomous agents is a process to extract globally interesting associations, classifiers, clusters, and other patterns from distributed data. As datasets double in size every year, moving the data repeatedly to distant CPUs brings about high communication cost. In this paper, data cloud is utilized to implement DDM in order to move the data rather than moving computation. MapReduce is a popular programming model for implementing data-centric distributed computing. Firstly, a kind of cloud system architecture for DDM is proposed. Secondly, a modified MapReduce framework called pipelined MapReduce is presented. We select Apriori as a case study and discuss its implementation within MapReduce framework. Several experiments are conducted at last. Experimental results show that with moderate number of map tasks, the execution time of DDM algorithms (i.e., Apriori) can be reduced remarkably. Performance comparison between traditional and our pipelined MapReduce has shown that the map task and reduce task in our pipelined MapReduce can run in a parallel manner, and our pipelined MapReduce greatly decreases the execution time of DDM algorithm. Data cloud is suitable for a multitude of DDM algorithms and can provide significant speedups.