Randomization Techniques for Data Mining Methods

  • Authors:
  • Heikki Mannila

  • Affiliations:
  • HIIT, Helsinki University of Technology and, University of Helsinki, Finland

  • Venue:
  • ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining research has concentrated on inventing novel methods for finding interesting information from large masses of data. This has indeed led to many new computational tasks and some interesting algorithmic developments. However, there has been less emphasis on issues of significance testing of the discovered patterns or models. We discuss the issues in testing the results of data mining methods, and review some of the recent work in the development of scalable algorithmic techniques for randomization tests for data mining methods. We consider suitable null models and generation algorithms for randomization of 0-1 -matrices, arbitrary real valued matrices, and segmentations. We also discuss randomization for database queries.