2LP: A double-lazy XML parser

Authors:
Fernando Farfán;Vagelis Hristidis;Raju Rangaswami
Affiliations:
School of Computing and Information Sciences, Florida International University, 11200 SW 8th Street, Miami, FL 33199, United States;School of Computing and Information Sciences, Florida International University, 11200 SW 8th Street, Miami, FL 33199, United States;School of Computing and Information Sciences, Florida International University, 11200 SW 8th Street, Miami, FL 33199, United States
Venue:
Information Systems
Year:
2009

Citing 25
Cited 2

The lazy lambda calculus

Research topics in functional programming
An introduction to disk drive modeling

Computer
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Accelerating XPath location steps

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Lazy XML processing

Proceedings of the 2002 ACM symposium on Document engineering
Processing XML Streams with Deterministic Automata

ICDT '03 Proceedings of the 9th International Conference on Database Theory
A Better XML Parser through Functional Programming

PADL '02 Proceedings of the 4th International Symposium on Practical Aspects of Declarative Languages
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
XML parsing: a threat to database performance

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Lazy XSL transformations

Proceedings of the 2003 ACM symposium on Document engineering
Operating System Concepts 7th Edition with Java 7th Edition

Operating System Concepts 7th Edition with Java 7th Edition
Static optimization of XSLT stylesheets: template instantiation optimization and lazy XML parsing

Proceedings of the 2005 ACM symposium on Document engineering
Prefiltering techniques for efficient XML document processing

Proceedings of the 2005 ACM symposium on Document engineering
Compressing and searching XML data via two zips

Proceedings of the 15th international conference on World Wide Web
A linear time algorithm for optimal tree sibling partitioning and approximation algorithms in Natix

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
XML Evolution: a two-phase XML processing model using XML prefiltering techniques

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Object-Oriented Software Engineering: Using UML, Patterns and Java, Second Edition

Object-Oriented Software Engineering: Using UML, Patterns and Java, Second Edition
Querying and maintaining a compact XML storage

Proceedings of the 16th international conference on World Wide Web
Efficient algorithms for processing XPath queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Engineering succinct DOM

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
A Parallel Approach to XML Parsing

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
Efficient memory representation of XML documents

DBPL'05 Proceedings of the 10th international conference on Database Programming Languages
XPathMark: an XPath benchmark for the XMark generated data

XSym'05 Proceedings of the Third international conference on Database and XML Technologies

Semi-indexing semi-structured data in tiny space

Proceedings of the 20th ACM international conference on Information and knowledge management
OXDP & OXiP: the notion of objects for efficient large XML data queries

International Journal of Grid and Utility Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML is acknowledged as the most effective format for data encoding and exchange over domains ranging from the World Wide Web to desktop applications. However, large-scale adoption into actual system implementations is being slowed down due to the inefficiency of its document-parsing methods. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have a key drawback-they must load the entire XML document in order to extract the overall document structure before document parsing can be performed. We have developed a framework for efficient parsing based on the idea of placing internal physical pointers within the XML document that allow the navigation process to skip large portions of the document during parsing. We show how to generate such internal pointers in a way that optimizes parsing using constructs supported by the current W3C XML standard. A double-lazy parser (2LP) exploits these internal pointers to efficiently parse the document. The usage of supported W3C constructs to create internal pointers allows 2LP to be backward compatible-i.e., the pointer-augmented documents can be parsed by current XML parsers. We also implemented a mechanism to efficiently parse large documents with limited main memory, thereby overcoming a major limitation in current solutions. We study our pointer generation and parsing algorithms both theoretically and experimentally, and show that they perform considerably better than existing approaches.