Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A maximum entropy approach to natural language processing
Computational Linguistics
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bucket elimination: a unifying framework for probabilistic inference
Proceedings of the NATO Advanced Study Institute on Learning in graphical models
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Prediction with local patterns using cross-entropy
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
Generalized queries on probabilistic context-free grammars
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
A Bayesian approach to learning Bayesian networks with local structure
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data
IEEE Transactions on Knowledge and Data Engineering
Sequence Modeling with Mixtures of Conditional Maximum Entropy Distributions
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Boolean formulas and frequent sets
Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Hi-index | 0.00 |
We investigate the application of Bayesian networks, Markov random fields, and mixture models to the problem of query answering for transaction data sets. We formulate two versions of the querying problem: the query selectivity estimation (i.e., finding exact counts for tuples in a data set) and the query generalization problem (i.e., computing the probability that a tuple will occur in new data). We show that frequent itemsets are useful for reducing the original data to a compressed representation and introduce a method to store them using an ADTree data structure. In an extension of our earlier work on this topic we propose several new schemes for query answering based on the compressed representation that avoid direct scans of the data at query time. Experimental results on real-world transaction data sets provide insights into various tradeoffs involving the offline time for model-building, the online time for query-answering, the memory footprint of the compressed data, and the accuracy of the estimate provided to the query.