New Techniques for Data Reduction in a Database System for Knowledge Discovery Applications

  • Authors:
  • Akhil Kumar

  • Affiliations:
  • College of Business, Campus Box 419, University of Colorado, Boulder, CO 80309-0419. E-mail: akhil.kumar@colorado.edu

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Databases store large amounts of information about consumertransactions and other kinds of transactions. This information canbe used to deduce rules about consumer behavior, and the rules can inturn be used to determine company policies, for instance with regardsto production, marketing and in several other areas. Since databasestypically store millions of records, and each record could have up to100 or more attributes, as an initial step it is necessary to reducethe size of the database by eliminating attributes that do notinfluence the decision at all or do so very minimally. In this paperwe present techniques that can be employed effectively for exact andapproximate reduction in a database system. These techniques can beimplemented efficiently in a database system using SQL (structuredquery language) commands. We tested their performance on a real dataset and validated them. The results showed that the classificationperformance actually improved with a reduced set of attributes ascompared to the case when all the attributes were present. We alsodiscuss how our techniques differ from statistical methods and otherdata reduction methods such as rough sets.