From searching text to querying XML streams

Authors:
Dan Suciu
Affiliations:
University of Washington, Department of Computer Science, 114 Sieg Hall, Box 352350, Seattle, WA
Venue:
Journal of Discrete Algorithms - SPIRE 2002
Year:
2004

Citing 22
Cited 0

Introduction to algorithms

Introduction to algorithms
From structured documents to novel query facilities

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Proximal nodes: a model to query document databases by content and structure

ACM Transactions on Information Systems (TOIS)
XML, Java, and the future of the Web

World Wide Web Journal - Special issue on XML: principles, tools, and techniques
Pattern Matching in Trees

Journal of the ACM (JACM)
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Mesh-based content routing using XML

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Handbook of Formal Languages

Handbook of Formal Languages
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
Processing XML Streams with Deterministic Automata

ICDT '03 Proceedings of the 9th International Conference on Database Theory
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Mind Your Grammar: a New Approach to Modelling Text

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An XML query engine for network-bound data

The VLDB Journal — The International Journal on Very Large Data Bases
The view selection problem for XML content based routing

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Stream processing of XPath queries with predicates

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
YFilter: Efficient and Scalable Filtering of XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient Filtering of XML Documents with XPath Expressions

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML data is queried with a limited form of regular expressions, in a language called XPath. New XML stream processing applications, such as content-based routing or selective dissemination of information, require thousands or millions of XPath expressions to be evaluated simultaneously on the incoming XML stream at a high, sustained rate. In its simplest approximation, the XPath evaluation problem is analogous to the text search problem, in which one or several regular expressions need to be matched to a given text. At a finer level, it is related to the tree pattern matching problem. However, unlike the traditional setting, the number of regular expressions here is much larger, while the "text" is much shorter, since it corresponds to the depth of the XML stream. In this paper we examine techniques that have been proposed for XML stream processing and describe a few open problems.