Efficient string matching: an aid to bibliographic search
Communications of the ACM
Mesh-based content routing using XML
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Optimizing Regular Path Expressions Using Graph Schemas
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Adding Structure to Unstructured Data
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Efficient Filtering of XML Documents for Selective Dissemination of Information
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Efficient filtering of XML documents with XPath expressions
The VLDB Journal — The International Journal on Very Large Data Bases
Stream processing of XPath queries with predicates
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Light-weight xPath processing of XML stream with deterministic automata
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Path sharing and predicate evaluation for high-performance XML filtering
ACM Transactions on Database Systems (TODS)
Processing XML streams with deterministic automata and stream indexes
ACM Transactions on Database Systems (TODS)
FiST: scalable XML document filtering by sequencing twig patterns
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Fast and scalable pattern matching for content filtering
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Multipattern string matching with q-grams
Journal of Experimental Algorithmics (JEA)
Schema-conscious filtering of XML documents
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Hi-index | 0.00 |
In a publish-subscribe system based on filtering of XML documents subscribers specify their interests with profiles expressed in the XPath language. The system processes a stream of XML documents and delivers to subscribers a notification or content of documents that match the profiles. We present a new XML-document-filtering algorithm that is based on the classic Aho-Corasick pattern-matching automaton. The automaton has a size linear in the sum of the sizes of the filters. We assume that the XML documents all conform to a given DTD; our algorithm utilizes the DTD in the preprocessing phase of the automaton to prune out descendant axes (//) and wildcards (*) from the XPath filters. The XPath subset currently supported consists of linear XPath expressions without predicates. In the case of a 683 MB protein-sequence database, we obtained a throughput of 18.8 MB/sec for 50 000 filters and 17.0 MB/sec for 500 000 filters, using a SAX parser with a throughput of 27 MB/sec.