Querying and Clustering Very Large Data Sets Using Dynamic Bucketing Approach

  • Authors:
  • Lixin Fu

  • Affiliations:
  • -

  • Venue:
  • WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we give a new efficient approach called dynamic bucketing to query and cluster very large data sets, a topic of great theoretical and practical significance. We partition data into equal-width buckets and further partition dense buckets into sub-buckets as needed by efficiently reclaiming and allocating memory space. The bucketing process dynamically adapts to the input order and distribution of input data sets. We propose a new data structure called the structure trees for storing aggregation information such as histograms of the buckets and sub-buckets. Since we only store the count values, the data structure is highly aggregated, concise, and very suitable for data sets with a large number of records. We also provide new query evaluation and data clustering algorithms based on the structure trees. Our simulation results show that our approach is superior to the current leading approaches in terms of the accuracy and performance.