Prefiltering techniques for efficient XML document processing

Authors:
Chia-Hsin Huang;Tyng-Ruey Chuang;Hahn-Ming Lee
Affiliations:
National Taiwan University of Science and Technology, Taipei, Taiwan and Academia Sinica, Taipei, Taiwan;Academia Sinica, Taipei, Taiwan;National Taiwan University of Science and Technology, Taipei, Taiwan
Venue:
Proceedings of the 2005 ACM symposium on Document engineering
Year:
2005

Citing 13
Cited 8

XRel: a path-based approach to storage and retrieval of XML documents using relational databases

ACM Transactions on Internet Technology (TOIT)
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
XPath: Looking Forward

EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
Efficient filtering of XML documents with XPath expressions

The VLDB Journal — The International Journal on Very Large Data Bases
The XML benchmark project

The XML benchmark project
YFilter: Efficient and Scalable Filtering of XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Accelerating XPath evaluation in any RDBMS

ACM Transactions on Database Systems (TODS)
FleXPath: flexible structure and full-text querying for XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Optimizing the lazy DFA approach for XML stream processing

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Fast structural query with application to chinese treebank sentence retrieval

Proceedings of the 2004 ACM symposium on Document engineering
Ctree: a compact tree for indexing XML data

Proceedings of the 6th annual ACM international workshop on Web information and data management

XML Evolution: a two-phase XML processing model using XML prefiltering techniques

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient GML-native processors for web-based GIS: techniques and tools

GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
Querying and browsing XML and relational data sources

Proceedings of the 2007 ACM symposium on Applied computing
A document object modeling method to retrieve data from a very large XML document

Proceedings of the 2007 ACM symposium on Document engineering
2LP: A double-lazy XML parser

Information Systems
Document engineering approaches toward scalable and structured multimedia, web and printable documents

Multimedia Tools and Applications
Building GML-native web-based geographic information systems

Computers & Geosciences
Efficient string-based XML stream prefiltering

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document Object Model (DOM) and Simple API for XML (SAX) are the two major programming models for XML document processing. Each, however, has its own efficiency limitation. DOM assumes an in-core representation of XML documents which can be problematic for large documents. SAX needs to scan over the document in a linear manner in order to locate the interesting fragments. Previously, we have used tree-to-table mapping and indexing techniques to help answer structural queries to large, or large collections of, XML documents. In this paper, we generalize the previous techniques into a prefiltering framework where repeated access to large XML documents can be efficiently carried out within the existing DOM and SAX models. The prefiltering framework essentially uses a tiny search engine to locate useful fragments in the target XML documents by approximately executing the user's queries. Those fragments are gathered into a candidate-set XML document, and is returned to the user's DOM- or SAX-based applications for further processing. This results in a practical and efficient model of XML processing, especially when the XML documents are large and infrequently updated, but are frequently being queried.