Parallel Predictor Generation

  • Authors:
  • David B. Skillicorn

  • Affiliations:
  • -

  • Venue:
  • Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification and regression are fundamental data mining techniques. The goal of such techniques is to build predictors based on a training dataset and use them to predict the properties of new data. For a wide range of techniques, combining predictors built on samples from the training dataset provides lower error rates, faster construction, or both, than a predictor built from the entire training dataset. This provides a natural parallelization strategy in which predictors based on samples are built independently and hence concurrently. We discuss the performance implications for two subclasses: those in which predictors are independent, and those in which knowing a set of predictors reduces the difficulty of finding a new one.