Data mining tasks and methods: scalability

Authors:
Foster Provost;Venkateswarlu Kolluri
Affiliations:
Associate Professor of Information Systems, Leonard N. Stern School of Business, New York University, New York;Research Scientist, Terra Lycos, Waltham, Massachusetts
Venue:
Handbook of data mining and knowledge discovery
Year:
2002

Citing 39
Cited 0

Simplifying decision trees

International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
Quantifying inductive bias: AI learning algorithms and Valiant's learning framework

Artificial Intelligence
Parallel depth first search. Part I. implementation

International Journal of Parallel Programming
Parallel depth first search. Part II. analysis

International Journal of Parallel Programming
Maximizing the predictive value of production rules

Artificial Intelligence
Incremental batch learning

Proceedings of the sixth international workshop on Machine learning
Symbolic and Neural Learning Algorithms: An Experimental Comparison

Machine Learning
ARIEL: a massively parallel symbolic learning assistant for protein structure and function

Artificial intelligence at MIT expanding frontiers
Induction of one-level decision trees

ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Learning decision lists using homogeneous rules

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
An Experimental Comparison of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms

Machine Learning
Digital libraries

Communications of the ACM
Inductive Policy: The Pragmatics of Bias Selection

Machine Learning - Special issue on bias evaluation and selection
Scaling up inductive learning with massive parallelism

Machine Learning
Wrappers for performance enhancement and oblivious decision graphs

Wrappers for performance enhancement and oblivious decision graphs
Error reduction through learning multiple descriptions

Machine Learning
On the Accuracy of Meta-learning for Scalable Data Mining

Journal of Intelligent Information Systems
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Squashing flat files flatter

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Multiple Comparisons in Induction Algorithms

Machine Learning
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Machine Learning
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Scaling Up Inductive Logic Programming by Learning from Interpretations

Data Mining and Knowledge Discovery
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
The Role of Occam‘s Razor in Knowledge Discovery

Data Mining and Knowledge Discovery
An Information Theoretic Approach to Rule Induction from Databases

IEEE Transactions on Knowledge and Data Engineering
Induction of Decision Trees

Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
The Effects of Training Set Size on Decision Tree Complexity

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Parallel Classification for Data Mining on Shared-Memory Multiprocessors

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Scalable data mining for rules

Scalable data mining for rules
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research
Cached sufficient statistics for efficient machine learning with large datasets

Journal of Artificial Intelligence Research
Generating C4.5 production rules in parallel

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Scaling up: distributed machine learning with cooperation

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the defining challenges for the KDD research community is scaling up data mining algorithms to mine very large collections of data. This article summarizes, categorizes, and compares existing work on scaling up data mining algorithms. In order to provide focus and specific details, we concentrate on algorithms that build decision trees and rule sets; the issues and techniques generalize to other types of data mining. We discuss the important issues related to scaling up and highlight similarities among scaling techniques by categorizing them into three main approaches. We describe in detail the characteristic features of each category, using specific examples as needed, and we compare and contrast different constituent techniques.