Incremental sequence-based frequent query pattern mining from XML queries

  • Authors:
  • Guoliang Li;Jianhua Feng;Jianyong Wang;Lizhu Zhou

  • Affiliations:
  • Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China 100084

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i +, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly.