Comparison of rough-set and statistical methods in inductive learning
International Journal of Man-Machine Studies
Communications of the ACM
Information Sciences: an International Journal
Data mining and rough set theory
Communications of the ACM
A Fast Parallel Clustering Algorithm for Large Spatial Databases
Data Mining and Knowledge Discovery
Parallel Computation of Reducts
RSCTC '98 Proceedings of the First International Conference on Rough Sets and Current Trends in Computing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
International Journal of Approximate Reasoning
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Neighborhood rough set based heterogeneous feature subset selection
Information Sciences: an International Journal
Probabilistic rough set approximations
International Journal of Approximate Reasoning
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Selecting discrete and continuous features based on neighborhood decision error minimization
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Reduction about approximation spaces of covering generalized rough sets
International Journal of Approximate Reasoning
A rough set approach to mining connections from information systems
Proceedings of the 2010 ACM Symposium on Applied Computing
Positive approximation: An accelerator for attribute reduction in rough set theory
Artificial Intelligence
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Design patterns for efficient graph algorithms in MapReduce
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
The incremental method for fast computing the rough fuzzy approximations
Data & Knowledge Engineering
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
MapReduce in the Clouds for Science
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Hadoop: The Definitive Guide
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Phoenix++: modular MapReduce for shared-memory systems
Proceedings of the second international workshop on MapReduce and its applications
Rapid parallel genome indexing with MapReduce
Proceedings of the second international workshop on MapReduce and its applications
Fast clustering using MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental learning optimization on knowledge discovery in dynamic business intelligent systems
Journal of Global Optimization
An interval set model for learning rules from incomplete information table
International Journal of Approximate Reasoning
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure
UCC '11 Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud Computing
Mahout in Action
International Journal of Approximate Reasoning
A parallel method for computing rough set approximations
Information Sciences: an International Journal
An efficient rough feature selection algorithm with a multi-granulation view
International Journal of Approximate Reasoning
NMGRS: Neighborhood-based multigranulation rough sets
International Journal of Approximate Reasoning
Parallel rough set based knowledge acquisition using MapReduce from big data
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
A granular neural network: Performance analysis and application to re-granulation
International Journal of Approximate Reasoning
International Journal of Approximate Reasoning
Hi-index | 0.00 |
Nowadays, with the volume of data growing at an unprecedented rate, large-scale data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently introduced MapReduce technique has received much attention from both scientific community and industry for its applicability in big data analysis. To mine knowledge from big data, we present parallel large-scale rough set based methods for knowledge acquisition using MapReduce in this paper. We implemented them on several representative MapReduce runtime systems: Hadoop, Phoenix and Twister. Performance comparisons on these runtime systems are reported in this paper. The experimental results show that (1) The computational time is mostly minimum on Twister while employing the same cores; (2) Hadoop has the best speedup for larger data sets; (3) Phoenix has the best speedup for smaller data sets. The excellent speedups also demonstrate that the proposed parallel methods can effectively process very large data on different runtime systems. Pitfalls and advantages of these runtime systems are also illustrated through our experiments, which are helpful for users to decide which runtime system should be used in their applications.