From Searching Text to Querying XML Streams

Authors:
Dan Suciu
Affiliations:
-
Venue:
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Year:
2002

Citing 15
Cited 0

Introduction to algorithms

Introduction to algorithms
From structured documents to novel query facilities

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Proximal nodes: a model to query document databases by content and structure

ACM Transactions on Information Systems (TOIS)
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Monitoring XML data on the Web

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mesh-based content routing using XML

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Handbook of Formal Languages

Handbook of Formal Languages
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Mind Your Grammar: a New Approach to Modelling Text

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Efficient Filtering of XML Documents with XPath Expressions

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML data is queried with XPath expressions, which are a limited form of regular expressions. New XML stream processing applications, such as content-based routing or selective dissemination of information, require thousands or millions of XPath expressions to be evaluated simultaneously on the incoming XML stream at a high, sustained rate.Con ceptually, the XPath evaluation problem is analogous to the text search problem, in which one or several regular expressions need to be matched to a given text, but the number of regular expressions here is much larger, while the "text" is much shorter, since it corresponds to the depth of the XML stream. In this paper we examine techniques that have been proposed for XML stream processing, which are variations of either a non-deterministic or a deterministic finite automata (NFA and DFA). For the latter, we describe a series or theoretical results establishing lower and upper bounds on the number of DFA states for sets of XPath expressions.