Building predictors from vertically distributed data

  • Authors:
  • Sabine McConnell;David B. Skillicorn

  • Affiliations:
  • School of Computing, Queen's University, Kingston, Canada;School of Computing, Queen's University, Kingston, Canada

  • Venue:
  • CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due in part to the large volume of data available today, but more importantly to privacy concerns, data are often distributed across institutional, geographical and organizational boundaries rather than being stored in a centralized location. Data can be distributed by separating objects or attributes: in the homogeneous case, sites contain subsets of objects with all attributes, while in the heterogeneous case sites contain subsets of attributes for all objects. Ensemble approaches combine the results obtained from a number of classifiers to obtain a final classification. In this paper, we present a novel ensemble approach, in which data is partitioned by attributes. We show that this method can successfully be applied to a wide range of data and can even produce an increase in classification accuracy compared to a centralized technique. As an ensemble approach, our technique exchanges models or classification results instead of raw data, which makes it suitable for privacy preserving data mining. In addition, both final model size and runtime are typically reduced compared to a centralized model. The proposed technique is evaluated using a decision tree, a variety of datasets, and several voting schemes. This approach is suitable for physically distributed data as well as privacy preserving data mining.