XML-document-filtering automaton

Authors:
Panu Silvasti;Seppo Sippu;Eljas Soisalon-Soininen
Affiliations:
Helsinki University of Technology;-;-
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 13
Cited 1

Efficient string matching: an aid to bibliographic search

Communications of the ACM
Mesh-based content routing using XML

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Optimizing Regular Path Expressions Using Graph Schemas

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Adding Structure to Unstructured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Efficient filtering of XML documents with XPath expressions

The VLDB Journal — The International Journal on Very Large Data Bases
Stream processing of XPath queries with predicates

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Light-weight xPath processing of XML stream with deterministic automata

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Path sharing and predicate evaluation for high-performance XML filtering

ACM Transactions on Database Systems (TODS)
Processing XML streams with deterministic automata and stream indexes

ACM Transactions on Database Systems (TODS)
FiST: scalable XML document filtering by sequencing twig patterns

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Fast and scalable pattern matching for content filtering

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Multipattern string matching with q-grams

Journal of Experimental Algorithmics (JEA)

Schema-conscious filtering of XML documents

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a publish-subscribe system based on filtering of XML documents subscribers specify their interests with profiles expressed in the XPath language. The system processes a stream of XML documents and delivers to subscribers a notification or content of documents that match the profiles. We present a new XML-document-filtering algorithm that is based on the classic Aho-Corasick pattern-matching automaton. The automaton has a size linear in the sum of the sizes of the filters. We assume that the XML documents all conform to a given DTD; our algorithm utilizes the DTD in the preprocessing phase of the automaton to prune out descendant axes (//) and wildcards (*) from the XPath filters. The XPath subset currently supported consists of linear XPath expressions without predicates. In the case of a 683 MB protein-sequence database, we obtained a throughput of 18.8 MB/sec for 50 000 filters and 17.0 MB/sec for 500 000 filters, using a SAX parser with a throughput of 27 MB/sec.