Scalable inductive learning on partitioned data

Authors:
Qijun Chen;Xindong Wu;Xingquan Zhu
Affiliations:
Department of Computer Science, University of Vermont, Burlington, Vermont;Department of Computer Science, University of Vermont, Burlington, Vermont;Department of Computer Science, University of Vermont, Burlington, Vermont
Venue:
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Year:
2005

Citing 10
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
From data mining to knowledge discovery: an overview

Advances in knowledge discovery and data mining
Knowledge Acquisition from Databases

Knowledge Acquisition from Databases
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
Efficient Mining of Association Rules in Distributed Databases

IEEE Transactions on Knowledge and Data Engineering
The CN2 Induction Algorithm

Machine Learning
Synthesizing High-Frequency Rules from Different Data Sources

IEEE Transactions on Knowledge and Data Engineering
Multi-layer Incremental Induction

PRICAI '98 Proceedings of the 5th Pacific Rim International Conference on Artificial Intelligence: Topics in Artificial Intelligence
An extensible meta-learning approach for scalable and accurate inductive learning

An extensible meta-learning approach for scalable and accurate inductive learning
Integrative Windowing

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid advancement of information technology, scalability has become a necessity for learning algorithms to deal with large, real-world data repositories. In this paper, scalability is accomplished through a data reduction technique, which partitions a large data set into subsets, applies a learning algorithm on each subset sequentially or concurrently, and then integrates the learned results. Five strategies to achieve scalability (Rule-Example Conversion, Rule Weighting, Iteration, Good Rule Selection, and Data Dependent Rule Selection) are identified and seven corresponding scalable schemes are designed and developed. A substantial number of experiments have been performed to evaluate these schemes. Experimental results demonstrate that through data reduction some of our schemes can effectively generate accurate classifiers from weak classifiers generated from data subsets. Furthermore, our schemes require significantly less training time than that of generating a global classifier.