Incremental sequence-based frequent query pattern mining from XML queries

Authors:
Guoliang Li;Jianhua Feng;Jianyong Wang;Lizhu Zhou
Affiliations:
Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China 100084
Venue:
Data Mining and Knowledge Discovery
Year:
2009

Citing 35
Cited 1

Borders: An Efficient Algorithm for Association Generation in Dynamic Databases

Journal of Intelligent Information Systems
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
APEX: an adaptive path index for XML data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Middle-tier database caching for e-business

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
XCache: a semantic caching system for XML queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovering Structural Association of Semistructured Data

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Cyclic Association Rules

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Frequent Quer Patterns from XML Queries

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Partial Periodic Patterns in Time Series Database

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ViST: a dynamic index method for querying XML data by tree structures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
PRIX: Indexing And Querying XML Using Prüfer Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
BIDE: Efficient Mining of Frequent Closed Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
2PXMiner: an efficient two pass mining of frequent XML query patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Mining of Frequent XML Query Patterns

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
FiST: scalable XML document filtering by sequencing twig patterns

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Query caching and view selection for XML databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Exploit sequencing to accelerate hot XML query pattern mining

Proceedings of the 2006 ACM symposium on Applied computing
Incremental Mining of Frequent Query Patterns from XML Queries for Caching

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Exploit sequencing views in semantic cache to accelerate xpath query evaluation

Proceedings of the 16th international conference on World Wide Web
Xproj: a framework for projected structural clustering of xml documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient mining of XML query patterns for caching

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for using materialized XPath views in XML query processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
SCEND: an efficient semantic cache to adequately explore answerability of views

WISE'06 Proceedings of the 7th international conference on Web Information Systems

Novel Applications of VR: Improving procedural modeling with semantics in digital architectural heritage

Computers and Graphics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i +, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly.