Scaling up: distributed machine learning with cooperation

Authors:
Foster John Provost;Daniel N. Hennessy
Affiliations:
NYNEX Science & Technology, White Plains, NY;Computer Science Department, University of Pittsburgh, Pittsburgh, PA
Venue:
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Year:
1996

Citing 16
Cited 34

Maximizing the predictive value of production rules

Artificial Intelligence
An ounce of knowledge is worth a ton of data: quantitative studies of the trade-off between expertise and data based on statistically well-founded empirical induction

Proceedings of the sixth international workshop on Machine learning
Incremental batch learning

Proceedings of the sixth international workshop on Machine learning
ARIEL: a massively parallel symbolic learning assistant for protein structure and function

Artificial intelligence at MIT expanding frontiers
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Learning decision lists using homogeneous rules

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Inductive Policy: The Pragmatics of Bias Selection

Machine Learning - Special issue on bias evaluation and selection
Scaling up inductive learning with massive parallelism

Machine Learning
Combinatorially implosive algorithms

Communications of the ACM
The CN2 Induction Algorithm

Machine Learning
Induction of Decision Trees

Machine Learning
Rule Induction with CN2: Some Recent Improvements

EWSL '91 Proceedings of the European Working Session on Machine Learning
Peepholing: Choosing Attributes Efficiently for Megainduction

ML '92 Proceedings of the Ninth International Workshop on Machine Learning
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research
Generating production rules from decision trees

IJCAI'87 Proceedings of the 10th international joint conference on Artificial intelligence - Volume 1
Oversearching and layered search in empirical learning

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

An integrated system for multi-rover scientific exploration

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Collaborative multiagent learning for classification tasks

Proceedings of the fifth international conference on Autonomous agents
Machine learning and inductive logic programming for multi-agent systems

Mutli-agents systems and applications
Boosting Algorithms for Parallel and Distributed Learning

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
Pasting Small Votes for Classification in Large Databases and On-Line

Machine Learning
Distributed learning with bagging-like performance

Pattern Recognition Letters
Analysis and synthesis of agents that learn from distributed dynamic data sources

Emergent neural computational architectures based on neuroscience
Learning When to Collaborate among Learning Agents

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Towards a Theory Revision Approach for the Vertical Fragmentation of Object Oriented Databases

SBIA '02 Proceedings of the 16th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Machine Learning and Inductive Logic Programming for Multi-agent Systems

EASSS '01 Selected Tutorial Papers from the 9th ECCAI Advanced Course ACAI 2001 and Agent Link's 3rd European Agent Systems Summer School on Multi-Agent Systems and Applications
Analysis and Synthesis of Agents That Learn from Distributed Dynamic Data Sources

Emergent Neural Computational Architectures Based on Neuroscience - Towards Neuroscience-Inspired Computing
Distributed Pasting of Small Votes

MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Learning Rules from Distributed Data

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Pruning and dynamic scheduling of cost-sensitive ensembles

Eighteenth national conference on Artificial intelligence
Data mining tasks and methods: scalability

Handbook of data mining and knowledge discovery
Learning Ensembles from Bites: A Scalable and Accurate Approach

The Journal of Machine Learning Research
Hierarchical Decision Tree Induction in Distributed Genomic Databases

IEEE Transactions on Knowledge and Data Engineering
Teaching new teammates

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
A framework for agent-based distributed machine learning and data mining

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
An analysis of data distribution in the ClassAge system: An agent-based system for classification tasks

Neurocomputing
Learning from cooperation using justifications

Proceedings of the 2006 conference on Artificial Intelligence Research and Development
An Evolutionary Ensemble-Based Method for Rule Extraction with Distributed Data

HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Distributed data mining: why do more than aggregating models

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Rule validation of a meta-classifier through a Galois (concept) lattice and complementary means

CLA'06 Proceedings of the 4th international conference on Concept lattices and their applications
Cooperative learning using advice exchange

Adaptive agents and multi-agent systems
MALEF: Framework for distributed machine learning and data mining

International Journal of Intelligent Information and Database Systems
Generating C4.5 production rules in parallel

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Argumentation-based Example Interchange for Multiagent Induction

Proceedings of the 2010 conference on Artificial Intelligence Research and Development: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence
Collaborative learning with logic-based models

ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Concept convergence in empirical domains

DS'10 Proceedings of the 13th international conference on Discovery science
Distributed learning with data reduction

Transactions on computational collective intelligence IV
Using decision trees for generating adaptive SPIT signatures

Proceedings of the 4th international conference on Security of information and networks
Peer-to-peer data mining classifiers for decentralized detection of network attacks

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine-learning methods are becoming increasingly popular for automated data analysis. However, standard methods do not scale up to massive scientific and business data sets without expensive hardware. This paper investigates a practical alternative for scaling up: the use of distributed processing to take advantage of the often dormant PCs and workstations available on local networks. Each workstation runs a common rule-learning program on a subset of the data. We first show that for commonly used rule-evaluation criteria, a simple form of cooperation can guarantee that a rule will look good to the set of cooperating learners if and only if it would look good to a single learner operating with the entire data set. We then show how such a system can further capitalize on different perspectives by sharing learned knowledge for significant reduction in search effort. We demonstrate the power of the method by learning from a massive data set taken from the domain of cellular fraud detection. Finally, we provide an overview of other methods for scaling up machine learning.