A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems

Authors:
Junbo Zhang;Jian-Syuan Wong;Tianrui Li;Yi Pan
Affiliations:
School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China and Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA;Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA;School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China;Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
Venue:
International Journal of Approximate Reasoning
Year:
2014

Citing 43
Cited 0

Comparison of rough-set and statistical methods in inductive learning

International Journal of Man-Machine Studies
Rough sets

Communications of the ACM
Automated extraction of medical expert system rules from clinical databases based on rough set theory

Information Sciences: an International Journal
Data mining and rough set theory

Communications of the ACM
A Fast Parallel Clustering Algorithm for Large Spatial Databases

Data Mining and Knowledge Discovery
Parallel Computation of Reducts

RSCTC '98 Proceedings of the First International Conference on Rough Sets and Current Trends in Computing
A rough sets based characteristic relation approach for dynamic attribute generalization in data mining

Knowledge-Based Systems
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation

Pattern Recognition
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A rough set approach for the discovery of classification rules in interval-valued information systems

International Journal of Approximate Reasoning
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Neighborhood rough set based heterogeneous feature subset selection

Information Sciences: an International Journal
Probabilistic rough set approximations

International Journal of Approximate Reasoning
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Selecting discrete and continuous features based on neighborhood decision error minimization

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Reduction about approximation spaces of covering generalized rough sets

International Journal of Approximate Reasoning
A rough set approach to mining connections from information systems

Proceedings of the 2010 ACM Symposium on Applied Computing
Positive approximation: An accelerator for attribute reduction in rough set theory

Artificial Intelligence
A comparison of join algorithms for log processing in MaPreduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Design patterns for efficient graph algorithms in MapReduce

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
The incremental method for fast computing the rough fuzzy approximations

Data & Knowledge Engineering
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
MapReduce in the Clouds for Science

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
An efficient accelerator for attribute reduction from incomplete data in rough set framework

Pattern Recognition
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Phoenix++: modular MapReduce for shared-memory systems

Proceedings of the second international workshop on MapReduce and its applications
Rapid parallel genome indexing with MapReduce

Proceedings of the second international workshop on MapReduce and its applications
Fast clustering using MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental learning optimization on knowledge discovery in dynamic business intelligent systems

Journal of Global Optimization
An interval set model for learning rules from incomplete information table

International Journal of Approximate Reasoning
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure

UCC '11 Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud Computing
Mahout in Action

Mahout in Action
Rough sets based matrix approaches with dynamic attribute variation in set-valued information systems

International Journal of Approximate Reasoning
A parallel method for computing rough set approximations

Information Sciences: an International Journal
An efficient rough feature selection algorithm with a multi-granulation view

International Journal of Approximate Reasoning
NMGRS: Neighborhood-based multigranulation rough sets

International Journal of Approximate Reasoning
Parallel rough set based knowledge acquisition using MapReduce from big data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
A granular neural network: Performance analysis and application to re-granulation

International Journal of Approximate Reasoning
Two basic double-quantitative rough set models of precision and grade and their investigation using granular computing

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, with the volume of data growing at an unprecedented rate, large-scale data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently introduced MapReduce technique has received much attention from both scientific community and industry for its applicability in big data analysis. To mine knowledge from big data, we present parallel large-scale rough set based methods for knowledge acquisition using MapReduce in this paper. We implemented them on several representative MapReduce runtime systems: Hadoop, Phoenix and Twister. Performance comparisons on these runtime systems are reported in this paper. The experimental results show that (1) The computational time is mostly minimum on Twister while employing the same cores; (2) Hadoop has the best speedup for larger data sets; (3) Phoenix has the best speedup for smaller data sets. The excellent speedups also demonstrate that the proposed parallel methods can effectively process very large data on different runtime systems. Pitfalls and advantages of these runtime systems are also illustrated through our experiments, which are helpful for users to decide which runtime system should be used in their applications.