PMCRI: A Parallel Modular Classification Rule Induction Framework

Authors:
Frederic Stahl;Max Bramer;Mo Adda
Affiliations:
University of Portsmouth, Portsmouth, United Kingdom PO1 3HE;University of Portsmouth, Portsmouth, United Kingdom PO1 3HE;University of Portsmouth, Portsmouth, United Kingdom PO1 3HE
Venue:
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2009

Citing 6
Cited 2

Experiments on multistrategy learning by meta-learning

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Induction of Decision Trees

Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
An Information-Theoretic Approach to the Pre-pruning of Classification Rules

Proceedings of the IFIP 17th World Computer Congress - TC12 Stream on Intelligent Information Processing
Parallel Classification for Data Mining on Shared-Memory Multiprocessors

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Jmax-pruning: A facility for the information theoretic pruning of modular classification rules

Knowledge-Based Systems
Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.