On the effectiveness of distributed learning on different class-probability distributions of data

Authors:
Diego Peteiro-Barral;Bertha Guijarro-Berdiñas;Beatriz Pérez-Sánchez
Affiliations:
Faculty of Informatics, University of A Coruña, A Coruña, Spain;Faculty of Informatics, University of A Coruña, A Coruña, Spain;Faculty of Informatics, University of A Coruña, A Coruña, Spain
Venue:
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Year:
2011

Citing 12
Cited 0

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
C4.5: programs for machine learning

C4.5: programs for machine learning
The nature of statistical learning theory

The nature of statistical learning theory
Handbook of mathematics (3rd ed.)

Handbook of mathematics (3rd ed.)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Clustering classifiers for knowledge discovery from physically distributed databases

Data & Knowledge Engineering
Efficient clustering of databases induced by local patterns

Decision Support Systems
Data privacy protection in multi-party clustering

Data & Knowledge Engineering
Privacy-Preserving Distributed Learning Based on Genetic Algorithms and Artificial Neural Networks

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living

Quantified Score

Hi-index	0.00

Visualization

Abstract

The unrestrainable growth of data in many domains in which machine learning could be applied has brought a new field called largescale learning that intends to develop efficient and scalable algorithms with regard to requirements of computation, memory, time and communications. A promising line of research for large-scale learning is distributed learning. It involves learning from data stored at different locations and, eventually, select and combine the "local" classifiers to obtain a unique global answer using one of three main approaches. This paper is concerned with a significant issue that arises when distributed data comes in from several sources, each of which has a different distribution. The class-probability distribution of data (CPDD) is defined and its impact on the performance of the three combination approaches is analyzed. Results show the necessity of taking into account the CPDD, concluding that combining only related knowledge is the most appropriate manner for learning in a distributed manner.