Mining frequent itemsets over uncertain databases

Authors:
Yongxin Tong;Lei Chen;Yurong Cheng;Philip S. Yu
Affiliations:
Hong Kong University of Science & Technology, Hong Kong, China;Hong Kong University of Science & Technology, Hong Kong, China;Northeastern University, China;University of Illinois at Chicago
Venue:
Proceedings of the VLDB Endowment
Year:
2012

Citing 27
Cited 5

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Querying Imprecise Data in Moving Object Environments

IEEE Transactions on Knowledge and Data Engineering
Robust and fast similarity search for moving object trajectories

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Probability and Computing: Randomized Algorithms and Probabilistic Analysis
On the marriage of Lp-norms and edit distance

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Finding frequent items in probabilistic data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Underground coal mine monitoring with wireless sensor networks

ACM Transactions on Sensor Networks (TOSN)
Managing and Mining Uncertain Data

Managing and Mining Uncertain Data
A Survey of Uncertain Data Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Frequent pattern mining with uncertain data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic frequent itemset mining in uncertain databases

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Canopy closure estimates with GreenOrbs: sustainable sensing in the forest

Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems
Mining frequent itemsets from uncertain data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A decremental approach for mining frequent itemsets from uncertain data

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A tree-based approach for frequent pattern mining from uncertain data

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining uncertain data with probabilistic guarantees

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Uncertain Data Using Voronoi Diagrams and R-Tree Index

IEEE Transactions on Knowledge and Data Engineering
Accelerating probabilistic frequent itemset mining: a model-based approach

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Approximation of Frequentness Probability of Itemsets in Uncertain Data

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Passive diagnosis for wireless sensor networks

IEEE/ACM Transactions on Networking (TON)
Outlier detection on uncertain data: Objects, instances, and inferences

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Efficient pattern mining of uncertain data with sampling

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

UFIMT: an uncertain frequent itemset mining toolbox

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent serial episodes over uncertain sequence data

Proceedings of the 16th International Conference on Extending Database Technology
FARP: Mining fuzzy association rules from a probabilistic quantitative database

Information Sciences: an International Journal
Summarizing probabilistic frequent patterns: a fast approach

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Reducing uncertainty of schema matching via crowdsourcing

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, due to the wide applications of uncertain data, mining frequent itemsets over uncertain databases has attracted much attention. In uncertain databases, the support of an itemset is a random variable instead of a fixed occurrence counting of this itemset. Thus, unlike the corresponding problem in deterministic databases where the frequent itemset has a unique definition, the frequent itemset under uncertain environments has two different definitions so far. The first definition, referred as the expected support-based frequent itemset, employs the expectation of the support of an itemset to measure whether this itemset is frequent. The second definition, referred as the probabilistic frequent itemset, uses the probability of the support of an itemset to measure its frequency. Thus, existing work on mining frequent itemsets over uncertain databases is divided into two different groups and no study is conducted to comprehensively compare the two different definitions. In addition, since no uniform experimental platform exists, current solutions for the same definition even generate inconsistent results. In this paper, we firstly aim to clarify the relationship between the two different definitions. Through extensive experiments, we verify that the two definitions have a tight connection and can be unified together when the size of data is large enough. Secondly, we provide baseline implementations of eight existing representative algorithms and test their performances with uniform measures fairly. Finally, according to the fair tests over many different benchmark data sets, we clarify several existing inconsistent conclusions and discuss some new findings.