A Density-Biased Sampling Technique to Improve Cluster Representativeness

Authors:
Ana Paula Appel;Adriano Arantes Paterlini;Elaine P. Sousa;Agma J. Traina;Caetano Traina, Jr.
Affiliations:
Computer Science Department - ICMC, University of São Paulo at São Carlos, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos, Brazil;Computer Science Department - ICMC, University of São Paulo at São Carlos, Brazil
Venue:
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2007

Citing 4
Cited 2

Density biased sampling: an improved method for data mining and clustering

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
Indexed-based density biased sampling for clustering applications

Data & Knowledge Engineering
Biased box sampling - a density-biased sampling for clustering

Proceedings of the 2007 ACM symposium on Applied computing

A general stochastic clustering method for automatic cluster discovery

Pattern Recognition
Pairwise similarity for cluster ensemble problem: link-based and approximate approaches

Transactions on Large-Scale Data- and Knowledge-centered systems IX

Quantified Score

Hi-index	0.00

Visualization

Abstract

The volume and complexity of data collected by modern applications has grown significantly, leading to increasingly costly operations for both data manipulation and analysis. Sampling is an useful technique to support manager a more sensible volume in the data reduction process. Uniform sampling has been widely used but, in datasets exhibiting skewed cluster distribution, biased sampling shows better results. This paper presents the BBS - Biased Box Samplingalgorithm which aims at keeping the skewed tendency of the clusters from the original data. We also present experimental results obtained with the proposed BBS algorithm.