Probabilistic query models for transaction data

  • Authors:
  • Dmitry Pavlov;Padhraic Smyth

  • Affiliations:
  • Information and Computer Science, University of California, Irvine, Irvine, CA;Information and Computer Science, University of California, Irvine, Irvine, CA

  • Venue:
  • Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the application of Bayesian networks, Markov random fields, and mixture models to the problem of query answering for transaction data sets. We formulate two versions of the querying problem: the query selectivity estimation (i.e., finding exact counts for tuples in a data set) and the query generalization problem (i.e., computing the probability that a tuple will occur in new data). We show that frequent itemsets are useful for reducing the original data to a compressed representation and introduce a method to store them using an ADTree data structure. In an extension of our earlier work on this topic we propose several new schemes for query answering based on the compressed representation that avoid direct scans of the data at query time. Experimental results on real-world transaction data sets provide insights into various tradeoffs involving the offline time for model-building, the online time for query-answering, the memory footprint of the compressed data, and the accuracy of the estimate provided to the query.