Parallel data mining revisited. better, not faster

Authors:
Zaenal Akbar;Violeta N. Ivanova;Michael R. Berthold
Affiliations:
Nycomed-Chair for Bioinformatics and Information Mining, Dept. of Computer and Information Science, University of Konstanz, Konstanz, Germany;Nycomed-Chair for Bioinformatics and Information Mining, Dept. of Computer and Information Science, University of Konstanz, Konstanz, Germany;Nycomed-Chair for Bioinformatics and Information Mining, Dept. of Computer and Information Science, University of Konstanz, Konstanz, Germany
Venue:
IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
Year:
2012

Citing 12
Cited 0

Induction of one-level decision trees

ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Almost optimal set covers in finite VC-dimension: (preliminary version)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Improving greedy algorithms by lookahead-search

Journal of Algorithms
Efficient NC algorithms for set cover with applications to learning and geometry

Proceedings of the 30th IEEE symposium on Foundations of computer science
Random Forests

Machine Learning
Induction of Decision Trees

Machine Learning
Approximation algorithms for combinatorial problems

STOC '73 Proceedings of the fifth annual ACM symposium on Theory of computing
Lookahead-based algorithms for anytime induction of decision trees

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Oversearching and layered search in empirical learning

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Lookahead and pathology in decision tree induction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Linear-work greedy parallel approximate set cover and variants

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we argue that parallel and/or distributed compute resources can be used differently: instead of focusing on speeding up algorithms, we propose to focus on improving accuracy. In a nutshell, the goal is to tune data mining algorithms to produce better results in the same time rather than producing similar results a lot faster. We discuss a number of generic ways of tuning data mining algorithms and elaborate on two prominent examples in more detail. A series of exemplary experiments is used to illustrate the effect such use of parallel resources can have.