Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
C4.5: programs for machine learning
C4.5: programs for machine learning
The nature of statistical learning theory
The nature of statistical learning theory
Handbook of mathematics (3rd ed.)
Handbook of mathematics (3rd ed.)
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Ensemble Methods in Machine Learning
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Clustering classifiers for knowledge discovery from physically distributed databases
Data & Knowledge Engineering
Efficient clustering of databases induced by local patterns
Decision Support Systems
Data privacy protection in multi-party clustering
Data & Knowledge Engineering
Privacy-Preserving Distributed Learning Based on Genetic Algorithms and Artificial Neural Networks
IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Hi-index | 0.00 |
The unrestrainable growth of data in many domains in which machine learning could be applied has brought a new field called largescale learning that intends to develop efficient and scalable algorithms with regard to requirements of computation, memory, time and communications. A promising line of research for large-scale learning is distributed learning. It involves learning from data stored at different locations and, eventually, select and combine the "local" classifiers to obtain a unique global answer using one of three main approaches. This paper is concerned with a significant issue that arises when distributed data comes in from several sources, each of which has a different distribution. The class-probability distribution of data (CPDD) is defined and its impact on the performance of the three combination approaches is analyzed. Results show the necessity of taking into account the CPDD, concluding that combining only related knowledge is the most appropriate manner for learning in a distributed manner.