On the effectiveness of distributed learning on different class-probability distributions of data

  • Authors:
  • Diego Peteiro-Barral;Bertha Guijarro-Berdiñas;Beatriz Pérez-Sánchez

  • Affiliations:
  • Faculty of Informatics, University of A Coruña, A Coruña, Spain;Faculty of Informatics, University of A Coruña, A Coruña, Spain;Faculty of Informatics, University of A Coruña, A Coruña, Spain

  • Venue:
  • CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The unrestrainable growth of data in many domains in which machine learning could be applied has brought a new field called largescale learning that intends to develop efficient and scalable algorithms with regard to requirements of computation, memory, time and communications. A promising line of research for large-scale learning is distributed learning. It involves learning from data stored at different locations and, eventually, select and combine the "local" classifiers to obtain a unique global answer using one of three main approaches. This paper is concerned with a significant issue that arises when distributed data comes in from several sources, each of which has a different distribution. The class-probability distribution of data (CPDD) is defined and its impact on the performance of the three combination approaches is analyzed. Results show the necessity of taking into account the CPDD, concluding that combining only related knowledge is the most appropriate manner for learning in a distributed manner.