Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

Authors:
Jianzhong Li;Zhaonian Zou;Hong Gao
Affiliations:
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2012

Citing 37
Cited 0

Logic programming, abduction and probability: a top-down anytime algorithm for estimating prior and posterior probabilities

Selected papers of international conference on Fifth generation computer systems 92
An Algorithm for Subgraph Isomorphism

Journal of the ACM (JACM)
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The complexity of mining maximal frequent itemsets and maximal frequent patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
SPIN: mining maximal frequent subgraphs from graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
Graphs over time: densification laws, shrinking diameters and possible explanations

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Discovering Frequent Graph Patterns Using Disjoint Paths

IEEE Transactions on Knowledge and Data Engineering
Sketching probabilistic data streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Finding frequent items in probabilistic data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Monte-Carlo algorithms for enumeration and reliability problems

SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
Finding reliable subgraphs from large probabilistic graphs

Data Mining and Knowledge Discovery
Conditioning probabilistic databases

Proceedings of the VLDB Endowment
Mining of Frequent Itemsets from Streams of Uncertain Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Probabilistic frequent itemset mining in uncertain databases

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
The good old Davis-Putnam procedure helps counting models

Journal of Artificial Intelligence Research
ProbLog: a probabilistic prolog and its application in link discovery

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Frequent subgraph pattern mining on uncertain graph data

Proceedings of the 18th ACM conference on Information and knowledge management
Local query mining in a probabilistic prolog

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Probabilistic path queries in road networks: traffic uncertainty aware path selection

Proceedings of the 13th International Conference on Extending Database Technology
Mining frequent itemsets from uncertain data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
gPrune: a constraint pushing framework for graph pattern mining

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A decremental approach for mining frequent itemsets from uncertain data

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining uncertain data with probabilistic guarantees

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Frequent Subgraph Patterns from Uncertain Graph Data

IEEE Transactions on Knowledge and Data Engineering
k-nearest neighbors in uncertain graphs

Proceedings of the VLDB Endowment
Efficient discovery of frequent subgraph patterns in uncertain graph databases

Proceedings of the 14th International Conference on Extending Database Technology
Efficiently answering probability threshold-based shortest path queries over uncertain graphs

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over uncertain graph data under the probabilistic semantics. Specifically, a measure called $${\varphi}$$ -frequent probability is introduced to evaluate the degree of recurrence of subgraphs. Given a set of uncertain graphs and two real numbers $${0 , the goal is to quickly find all subgraphs with $${\varphi}$$ -frequent probability at least 驴. Due to the NP-hardness of the problem and to the #P-hardness of computing the $${\varphi}$$ -frequent probability of a subgraph, an approximate mining algorithm is proposed to produce an $${(\varepsilon, \delta)}$$ -approximate set 驴 of "frequent subgraphs", where $${0 is error tolerance, and 0 驴 S is contained in 驴 with probability at least ((1 驴驴) /2) s , where s is the number of edges in S; (2) any infrequent subgraph with $${\varphi}$$ -frequent probability less than $${\tau - \varepsilon}$$ is contained in 驴 with probability at most 驴/2. The theoretical analysis shows that to obtain any frequent subgraph with probability at least 1 驴 Δ, the input parameter 驴 of the algorithm must be set to at most $${1 - 2 (1 - \Delta)^{1 / \ell_{\max}}}$$ , where 0 Δ 驴 max is the maximum number of edges in frequent subgraphs. Extensive experiments on real uncertain graph data verify that the proposed algorithm is practically efficient and has very high approximation quality. Moreover, the difference between the probabilistic semantics and the expected semantics on mining frequent subgraphs over uncertain graph data has been discussed in this paper for the first time.