Optimizing subset queries: a step towards SQL-based inductive databases for itemsets

  • Authors:
  • Cyrille Masson;Céline Robardet;Jean-François Boulicaut

  • Affiliations:
  • INSA de Lyon-LIRIS, Villeurbanne, France;INSA de Lyon-PRISMA, Villeurbanne, France;INSA de Lyon-LIRIS, Villeurbanne, France

  • Venue:
  • Proceedings of the 2004 ACM symposium on Applied computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Storing sets and querying them (e.g., subset queries that provide all supersets of a given set) is known to be difficult within relational databases. We consider that being able to query efficiently both transactional data and materialized collections of sets by means of standard query language is an important step towards practical inductive databases. Indeed, data mining query languages like MINE RULE extract collections of association rules whose components are sets into relational tables. Post-processing phases often use extensively subset queries and cannot be efficiently processed by SQL servers. In this paper, we propose a new way to handle sets from relational databases. It is based on a data structure that partially encodes the inclusion relationship between sets. It is an extension of the hash group bitmap key proposed by Morzy et al. [8]. Our experiments show an interesting improvement for these useful subset queries.