Optimizing relational store for e-catalog queries: a data mining approach

  • Authors:
  • Min Wang;X. Sean Wang

  • Affiliations:
  • IBM T. J. Watson Research Center, Hawthorne, NY;George Mason University, Fairfax, Virginia

  • Venue:
  • Proceedings of the 2002 ACM symposium on Applied computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

A frequent use of database management systems in electronic commerce is to provide electronic product catalogs (e-catalogs) that allow users to search for products of interest via constraints on attributes. An intuitively straightforward representation of e-catalogs is to use one table for the whole e-catalog as it is conceptually easy to maintain and query. However, for any e-commerce business with a reasonably large number of products and product types, its e-catalog usually involves a large number of attributes due to the great variety of the products, and at the same time, contains a large number of null values due to the fact that each product only has values under a relatively small number of attributes. Because of these properties, the above intuitive method does not work well in current relational database systems. Techniques have been proposed in the literature to deal with this problem, namely binary and vertical schemas. However, these techniques fail to take advantage of inherent properties of realistic e-catalogs to provide superior performance. This paper proposes a novel decomposition method for e-catalogs based on association rule discovery, a data mining technique. The method discovers groups of attributes that frequently appear together, i.e., are frequently used together to describe products, and generates schemas that contain these groups. This paper also reports experimental results showing the efficiency of the method.