Communications of the ACM
International Journal of Man-Machine Studies - Special Issue: Knowledge Acquisition for Knowledge-based Systems. Part 5
Quantifying inductive bias: AI learning algorithms and Valiant's learning framework
Artificial Intelligence
Parallel depth first search. Part I. implementation
International Journal of Parallel Programming
Parallel depth first search. Part II. analysis
International Journal of Parallel Programming
Boolean Feature Discovery in Empirical Learning
Machine Learning
Maximizing the predictive value of production rules
Artificial Intelligence
Proceedings of the sixth international workshop on Machine learning
Proceedings of the sixth international workshop on Machine learning
Symbolic and Neural Learning Algorithms: An Experimental Comparison
Machine Learning
ARIEL: a massively parallel symbolic learning assistant for protein structure and function
Artificial intelligence at MIT expanding frontiers
C4.5: programs for machine learning
C4.5: programs for machine learning
Efficient noise-tolerant learning from statistical queries
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Policies for the selection of bias in inductive machine learning
Policies for the selection of bias in inductive machine learning
Estimating attributes: analysis and extensions of RELIEF
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Massively parallel matching of knowledge structures
Massively parallel artificial intelligence
A storage system for scalable knowledge representation
CIKM '94 Proceedings of the third international conference on Information and knowledge management
Learning decision lists using homogeneous rules
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Communications of the ACM
Evaluation and Selection of Biases in Machine Learning
Machine Learning - Special issue on bias evaluation and selection
Inductive Policy: The Pragmatics of Bias Selection
Machine Learning - Special issue on bias evaluation and selection
Parka: A system for massively parallel knowledge representation
Parka: A system for massively parallel knowledge representation
Scaling up inductive learning with massive parallelism
Machine Learning
Mining quantitative association rules in large relational tables
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Wrappers for performance enhancement and oblivious decision graphs
Wrappers for performance enhancement and oblivious decision graphs
Error reduction through learning multiple descriptions
Machine Learning
On the Accuracy of Meta-learning for Scalable Data Mining
Journal of Intelligent Information Systems
From data mining to knowledge discovery: an overview
Advances in knowledge discovery and data mining
Data surveyor: the nuggets in parallel
Advances in knowledge discovery and data mining
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Integrating association rule mining with relational database systems: alternatives and implications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Exploiting parallelism in a structural scientific discovery system to improve scalability
Journal of the American Society for Information Science - Special topic issue: youth issues in information science
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Multiple Comparisons in Induction Algorithms
Machine Learning
Mining Very Large Databases with Parallel Processing
Mining Very Large Databases with Parallel Processing
Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF
Applied Intelligence
Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery
Scaling Up Inductive Logic Programming by Learning from Interpretations
Data Mining and Knowledge Discovery
An Information Theoretic Approach to Rule Induction from Databases
IEEE Transactions on Knowledge and Data Engineering
Data Mining: An Overview from a Database Perspective
IEEE Transactions on Knowledge and Data Engineering
Incremental Induction of Decision Trees
Machine Learning
Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
The Effects of Training Set Size on Decision Tree Complexity
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Knowledge Acquisition form Examples Vis Multiple Models
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Multi-layer Incremental Induction
PRICAI '98 Proceedings of the 5th Pacific Rim International Conference on Artificial Intelligence: Topics in Artificial Intelligence
Induction of One-Level Decision Trees
ML '92 Proceedings of the Ninth International Workshop on Machine Learning
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Parallel Classification for Data Mining on Shared-Memory Multiprocessors
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Extracting comprehensible models from trained neural networks
Extracting comprehensible models from trained neural networks
Free parallel data mining
Scalable data mining for rules
Scalable data mining for rules
OPUS: an efficient admissible algorithm for unordered search
Journal of Artificial Intelligence Research
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
Journal of Artificial Intelligence Research
Knowledge representation in the large
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Generating C4.5 production rules in parallel
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Scaling up: distributed machine learning with cooperation
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
High performance data mining (tutorial PM-3)
Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting collective probabilistic forecasts from web games
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Understanding the Crucial Role of AttributeInteraction in Data Mining
Artificial Intelligence Review
Density-Based Multiscale Data Condensation
IEEE Transactions on Pattern Analysis and Machine Intelligence
An integrated approach for scaling up classification and prediction algorithms for data mining
SAICSIT '02 Proceedings of the 2002 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology
On Issues of Instance Selection
Data Mining and Knowledge Discovery
Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction
Data Mining and Knowledge Discovery
Synthesizing High-Frequency Rules from Different Data Sources
IEEE Transactions on Knowledge and Data Engineering
Efficiently Determining the Starting Sample Size for Progressive Sampling
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Iteratively Selecting Feature Subsets for Mining from High-Dimensional Databases
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Discovering Knowledge from Meteorological Databases: A Meteorological Aviation Forecast Study
DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Shared Memory Parallelization of Decision Tree Construction Using a General Data Mining Middleware
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Efficient Data Mining by Active Learning
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
S3Bagging: Fast Classifier Induction Method with Subsampling and Bagging
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Parallel and Distributed Data Mining: An Introduction
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Data mining tasks and methods: scalability
Handbook of data mining and knowledge discovery
Handbook of data mining and knowledge discovery
Handbook of data mining and knowledge discovery
Tree Induction for Probability-Based Ranking
Machine Learning
Tree induction vs. logistic regression: a learning-curve analysis
The Journal of Machine Learning Research
Prototype-based mining of numeric data streams
Proceedings of the 2003 ACM symposium on Applied computing
PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering decision rules from numerical data streams
Proceedings of the 2004 ACM symposium on Applied computing
Learning Ensembles from Bites: A Scalable and Accurate Approach
The Journal of Machine Learning Research
Lessons and Challenges from Mining Retail E-Commerce Data
Machine Learning
IEEE Transactions on Knowledge and Data Engineering
Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
A Services Oriented Framework for Next Generation Data Analysis Centers
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Design of a next generation sampling service for large scale data analysis applications
Proceedings of the 19th annual international conference on Supercomputing
Enhancing Density-Based Data Reduction Using Entropy
Neural Computation
A scalable decision tree system and its application in pattern recognition and intrusion detection
Decision Support Systems
Maxdiff kd-trees for data condensation
Pattern Recognition Letters
Data Mining and Knowledge Discovery
Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis
ACM Transactions on Mathematical Software (TOMS)
Optimization-based feature selection with adaptive instance sampling
Computers and Operations Research
A new imputation method for small software project data sets
Journal of Systems and Software
Expert Systems with Applications: An International Journal
Genetic algorithm-based feature set partitioning for classification problems
Pattern Recognition
Genetic algorithm-based feature set partitioning for classification problems
Pattern Recognition
Parallel learning using decision trees: a novel approach
AMCOS'05 Proceedings of the 4th WSEAS International Conference on Applied Mathematics and Computer Science
Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection
Expert Systems with Applications: An International Journal
DataJewel: Integrating Visualization with Temporal Data Mining
Visual Data Mining
A Feature Selection Algorithm Based on Discernibility Matrix
Computational Intelligence and Security
Learning Classifiers from Large Databases Using Statistical Queries
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
HTILDE: scaling up relational decision trees for very large databases
Proceedings of the 2009 ACM symposium on Applied Computing
A divide-and-conquer recursive approach for scaling up instance selection algorithms
Data Mining and Knowledge Discovery
A hybrid approach to design efficient learning classifiers
Computers & Mathematics with Applications
A fast decision tree learning algorithm
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Stochastic gradient boosted distributed decision trees
Proceedings of the 18th ACM conference on Information and knowledge management
PLANET: massively parallel learning of tree ensembles with MapReduce
Proceedings of the VLDB Endowment
A scalable decision tree system and its application in pattern recognition and intrusion detection
Decision Support Systems
Artificial Intelligence Review
Why fuzzy decision trees are good rankers
IEEE Transactions on Fuzzy Systems
Database implementation of a model-free classifier
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Association rule mining: models and algorithms
Association rule mining: models and algorithms
CAMEO: continuous analytics for massively multiplayer online games on cloud resources
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
IEEE Transactions on Evolutionary Computation
Proceedings of the 9th Annual Workshop on Network and Systems Support for Games
Scaling up feature selection by means of democratization
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Local graph sparsification for scalable clustering
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A comparative analysis of methods for probability estimation tree
WSEAS Transactions on Computers
Scalability analysis of ANN training algorithms with feature selection
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Scalable inductive learning on partitioned data
ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Editorial: Large scale instance selection by means of federal instance selection
Data & Knowledge Engineering
ACM Transactions on Knowledge Discovery from Data (TKDD)
Texture based decision tree classification for Arecanut
Proceedings of the CUBE International Information Technology Conference
A scalable approach to simultaneous evolutionary instance and feature selection
Information Sciences: an International Journal
Toward the scalability of neural networks through feature selection
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
One of the defining challenges for the KDD researchcommunity is to enable inductive learning algorithms to mine verylarge databases. This paper summarizes, categorizes, and comparesexisting work on scaling up inductive algorithms. We concentrate onalgorithms that build decision trees and rule sets, in order toprovide focus and specific details; the issues and techniquesgeneralize to other types of data mining. We begin with a discussionof important issues related to scaling up. We highlight similaritiesamong scaling techniques by categorizing them into three mainapproaches. For each approach, we then describe, compare, andcontrast the different constituent techniques, drawing on specificexamples from published papers. Finally, we use the precedinganalysis to suggest how to proceed when dealing with a largeproblem, and where to focus future research.