Finding hot query patterns over an XQuery stream

Authors:
Huai Yang;Li Lee;Wynne Hsu
Affiliations:
School of Computing, National University of Singapore, Singapore and School of Electronics Engineering and Computer Science, Peking University, P.R. China;School of Computing, National University of Singapore, Singapore;School of Computing, National University of Singapore, Singapore
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2004

Citing 41
Cited 7

Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Online association rule mining

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Synopsis data structures for massive data sets

External memory algorithms
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Containment and equivalence for an XPath fragment

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Characterizing memory requirements for queries over continuous data streams

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling from a moving window over streaming data

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
XCache: a semantic caching system for XML queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovering Structural Association of Semistructured Data

IEEE Transactions on Knowledge and Data Engineering
Containment for XPath Fragments under DTD Constraints

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
XPath Containment in the Presence of Disjunction, DTDs, and Variables

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Mining Frequent Quer Patterns from XML Queries

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Online Algorithms for Mining Semi-structured Data Stream

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Approximate join processing over data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Chain: operator scheduling for memory minimization in data stream systems

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Processing set expressions over continuous update streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Gigascope: a stream database for network applications

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient elastic burst detection in data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
XPath query containment

ACM SIGMOD Record
Comparing data streams using Hamming norms (how to zero in)

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient mining of XML query patterns for caching

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Load shedding in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Adaptive, hands-off stream mining

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Operator scheduling in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Online mining of frequent query trees over XML data streams

Proceedings of the 15th international conference on World Wide Web
Matching subsequences in trees

Journal of Discrete Algorithms
Mining tree-structured data on multicore systems

Proceedings of the VLDB Endowment
The tree inclusion problem: In linear space and faster

ACM Transactions on Algorithms (TALG)
The tree inclusion problem: in optimal space and faster

ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Matching subsequences in trees

CIAC'06 Proceedings of the 6th Italian conference on Algorithms and Complexity
Processing global XQuery queries based on static query decomposition

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Caching query results is one efficient approach to improving the performance of XML management systems. This entails the discovery of frequent XML queries issued by users. In this paper, we model user queries as a stream of XML query pattern trees and mine the frequent query patterns over the query stream. To facilitate the one-pass mining process, we devise a novel data structure called DTS to summarize the pattern trees seen so far. By grouping the incoming pattern trees into batches, we can dynamically mark the active portion of the current batch in DTS and limit the enumeration of candidate trees to only the currently active pattern trees. We also design another summary data structure called ECTree that provides for the incremental computation of the frequent tree patterns over the query stream. Based on the above two constructs, we present two mining algorithms called XQSMinerI and XQSMinerII. XQSMinerI is fast, but it tends to overestimate, while XQSMinerII adopts a filter-and-refine approach to minimize the amount of overestimation. Experimental results show that the proposed methods are both efficient and scalable and require only small memory footprints.