A Density-Biased Sampling Technique to Improve Cluster Representativeness

  • Authors:
  • Ana Paula Appel;Adriano Arantes Paterlini;Elaine P. Sousa;Agma J. Traina;Caetano Traina, Jr.

  • Affiliations:
  • Computer Science Department - ICMC, University of São Paulo at São Carlos, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos, Brazil

  • Venue:
  • PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The volume and complexity of data collected by modern applications has grown significantly, leading to increasingly costly operations for both data manipulation and analysis. Sampling is an useful technique to support manager a more sensible volume in the data reduction process. Uniform sampling has been widely used but, in datasets exhibiting skewed cluster distribution, biased sampling shows better results. This paper presents the BBS - Biased Box Samplingalgorithm which aims at keeping the skewed tendency of the clusters from the original data. We also present experimental results obtained with the proposed BBS algorithm.