A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees

  • Authors:
  • Doina Caragea;Adrian Silvescu;Vasant Honavar

  • Affiliations:
  • Artificial Intelligence Research Laboratory, Computer Science Department, Iowa State University, 226 Atanasoff Hall, Ames, IA 50011-1040, USA. {dcaragea, silvescu, honavar}@cs.iastate.edu;Artificial Intelligence Research Laboratory, Computer Science Department, Iowa State University, 226 Atanasoff Hall, Ames, IA 50011-1040, USA. {dcaragea, silvescu, honavar}@cs.iastate.edu;Artificial Intelligence Research Laboratory, Computer Science Department, Iowa State University, 226 Atanasoff Hall, Ames, IA 50011-1040, USA. {dcaragea, silvescu, honavar}@cs.iastate.edu

  • Venue:
  • International Journal of Hybrid Intelligent Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper motivates and precisely formulates the problem of learning from distributed data; describes a general strategy for transforming traditional machine learning algorithms into algorithms for learning from distributed data; demonstrates the application of this strategy to devise algorithms for decision tree induction from distributed data; and identifies the conditions under which the algorithms in the distributed setting are superior to their centralized counterparts in terms of time and communication complexity. The resulting algorithms are provably exact in that the decision tree constructed from distributed data is identical to that obtained in the centralized setting. Some natural extensions leading to algorithms for learning from heterogeneous distributed data and learning under privacy constraints are outlined.