Approximation of Frequentness Probability of Itemsets in Uncertain Data

Authors:
Toon Calders;Calin Garboni;Bart Goethals
Affiliations:
-;-;-
Venue:
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Year:
2010

Citing 0
Cited 10

Mining frequent patterns from univariate uncertain data

Data & Knowledge Engineering
Fast approximation of probabilistic frequent closed itemsets

Proceedings of the 50th Annual Southeast Regional Conference
Fast tree-based mining of frequent itemsets from uncertain data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
UFIMT: an uncertain frequent itemset mining toolbox

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent itemsets over uncertain databases

Proceedings of the VLDB Endowment
Frequent item set mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A fast algorithm for frequent itemset mining using Patricia* structures

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Mining frequent serial episodes over uncertain sequence data

Proceedings of the 16th International Conference on Extending Database Technology
FARP: Mining fuzzy association rules from a probabilistic quantitative database

Information Sciences: an International Journal
Summarizing probabilistic frequent patterns: a fast approach

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent item sets from transactional datasets is a well known problem with good algorithmic solutions. Most of these algorithms assume that the input data is free from errors. Real data, however, is often affected by noise. Such noise can be represented by uncertain datasets in which each item has an existence probability. Recently, Bernecker et al. (2009) proposed the frequentness probability, i.e., the probability that a given item set is frequent, to select item sets in an uncertain database. A dynamic programming approach to evaluate this measure was given as well. We argue, however, that for the setting of Bernecker et al. (2009), that assumes independence between the items, already well-known statistical tools exist. We show how the frequentness probability can be approximated extremely accurately using a form of the central limit theorem. We experimentally evaluated our approximation and compared it to the dynamic programming approach. The evaluation shows that our approximation method is extremely accurate even for very small databases while at the same time it has much lower memory overhead and computation time.